Looks Good, Reads Bad: Imaging 5–25-inch floppy disks on mismatched drives

1 month ago 5

Leontien Talboom

In digital preservation, ideal conditions are rare. We often don’t have the luxury of choosing the ‘correct’ drive for our removable carriers; especially when dealing with legacy media like 5.25-inch floppies. Drives and disks may be unlabeled, disk annotations may be incomplete and the reading software can produce misleading results.

Press enter or click to view image in full size

A bunch of disks from our test collection, the labelling can be very specific for some, going down to the operating system they were created on. Some give an idea of how they were formatted, HD being High Density, but others are more generic, saying things like ‘Tax Book’.

In a lot of cases we end up with a bunch of disks where it is unclear what drive they were written on, and we are lucky if we are able to acquire any drive at all. The Transfer Service at Cambridge University Library (CUL) has been in the very privileged position to have acquired a number of 5.25-inch floppy drives. Recently I have been able to get a 40-track drive up and running, this meant that up until now all our Double Density disks (probably created on 40-track drives) were read on 80-track drives. Within the retro computing community there is the understanding that it is best to read a disk on the type of drive it was created on, but that is definitely not always possible for us, especially if we only have one drive at our disposal.

After examining some Double Density Disks with Chris Knowles from Churchill Archives Centre on an 80-track drive, we were wondering if differences would actually show up when using different drives and settings. This got us thinking; what actually happens when we mismatch drives and disks? This blog post provides an overview of some testing that I did to see what happens when 40-track (also referred to as Double Density) and 80-track (High Density) 5.25-inch disks are read on mismatched drives, using a GreaseWeazle floppy controller and visualised with HxC Floppy Emulator. It highlights what to pay attention to, what can be misleading, and how subtle data loss can creep in unnoticed.

Context and Setup

All examples in this post will use DOS-formatted floppy disks from our test collection. I know these disks read well and use them to test other drives and new workflows. However, this does not reflect our actual collection material, we have many weird and wonderful disks in our collection. But what it will do is make it possible to access the files on the floppy disks, as many tools support the DOS-formatted disks, and highlight some of the false positives in a more meaningful way.

The focus of this post will be on the 5.25-inch drives, both the 40-track and 80-track variants. While similar principles apply to 3.5-inch floppy disk drives, they add more complexity due to variations in revolutions per minute (RPM), materials and formats. These are well worth having a look at in a future blog, but if you are interested in their history, I would highly recommend this video.

Just as a very quick background, early 5.25-inch floppy drives used 40 tracks with 48 tracks per inch (TPI). Later drives supported 80 tracks with 96 TPI, enabling higher densities. The disk surface is physically the same, but the track spacing is tighter on 80-track drives. This meant that in general the 80-track drives can read 40-track drives, but not the other way around. This is also why 80-track drives can be referred to as High Density drives and 40-track drives as Low Density drives, but there are a few exceptions and therefore I will stick to 40-track and 80-track terminology for this post.

Also, as an aside, it can be really difficult to know if you have a 40-track or an 80-track drive. Our 80-track drive has a serial number on it, whilst our 40-track drive has no labelling. I only know it is a 40-track drive because of testing. The image below just shows that visually this difference is impossible to spot, there is so much variation in the manufacturing of all 5.25-inch drives, that being able to see without labels what types of floppy disks these drives can read seems unfeasible.

Press enter or click to view image in full size

Two drives in our lab, the one on the left is the 80-track drive, the one on the right is the 40-track drive.

Reading a 40-track (Double Density) Disk on a 40-track Drive

The first test is to read a disk on a drive that is actually meant to match, a Double Density disk with 40-tracks on a 40-track drive. This disk was created on the IBM System/360, a DOS formatted system. For all the outputs in this blog I will be creating a flux stream in the SCP format, which captures the raw magnetic flux on the disks. The SCP format is one of the options on the GreaseWeazle which we use for our workflows at CUL. More information on the different options on the GreaseWeazle can be found here.

gw read NameOfDisk.scp --drive=0

This is the command to use in the terminal. Note that our 40-track drive is found on Drive 0, this may be different for your setup.

Press enter or click to view image in full size

HxC output reading a 40-track (Double Density) disk on a 40-track drive with the default settings on a GreaseWeazle for a raw SCP flux stream.

Above you can see the output visualised in HcX. This may look wrong, there is a large inner red ring on the disk. But this is actually to be expected. This is because the GreaseWeazle defaults to reading 82 tracks on a disk, which is the most floppy disks can have. Meaning that it will get the data for most floppy disks out there (there are always exceptions in the world of floppy disks and I am therefore hesitant to say all). However, as this drive can only read 40 tracks, it will read empty data for the remainder of the command.

When accessing these files on HxC Floppy Emulator (which can look at files on DOS formatted disks). The tool is able to parse and make the files viewable. But you can also clean up the read by specifying the number of tracks on the GreaseWeazle, as we know this is a Double Density 40-track floppy disk:

gw read NameOfDisk.scp --drive=0 --tracks=c=0-39

Press enter or click to view image in full size

HxC output reading a 40-track (Double Density) disk on a 40-track drive with 40 tracks specified on a GreaseWeazle for a raw SCP flux stream.

Both reads are valid, but the first one just has unneeded extra track space.

Going beyond the Magnetic Flux Stream

This will not be possible for the more obscure formats of floppy disks, but we can deduce the format of this disk and create a disk image for this floppy disk. From looking at the earlier flux streams that we made we know we have a 5.25-inch floppy disk with 40 tracks and 9 sectors. When looking this up in the handy format table in this tutorial, it can be concluded that this is a IBM 360K floppy disk and therefore, if you wanted to create a disk image for an emulator or other tools, the following command can be used:

gw read --format=ibm.360 NameOfDisk.img --drive=0

Reading an 80-track (High Density) Disk on a 40-track Drive

This is where things get messy, when reading a High Density disk on a 40-track drive with the same default GreaseWeazle settings as the first read, an unreadable image is generated. This is because the magnetic alignment is incompatible. This doesn’t automatically mean the disc is corrupt (we are using a trusted test disk), in this case it is just not the right drive to use for this disk.

Press enter or click to view image in full size

HxC output reading an 80-track (High Density) disk on a 40-track drive with defaults settings on a GreaseWeazle for a raw SCP flux stream.

Reading a 40-track (Double Density) Disk on an 80-track Drive

Next step is to move on to our 80-track drive. My favourite 80-track drive in our collection is a TEAC-branded one that has never let me down. Sometimes the reads can get a bit wonky on it, but a good clean always seems to fix this problem! An 80-track drive will physically read all 80 tracks, but this will look really interesting when reading a 40-track (Double Density) on this drive.

gw read NameOfDisk.scp --drive=B

Note that here the drive specification has changed, it’s because our 80-track drive and ribbon cable setup is on Drive B. Again, this may be different for your workflow and setup.

Press enter or click to view image in full size

HxC output reading a 40-track (Double Density) disk on an 80-track drive with defaults settings on a GreaseWeazle for a raw SCP flux stream.

Again this may look incorrect, but these red rings in the image stem from the 80-track drive’s finer stepping. It tried to read too narrowly over tracks laid down with wider spacing. However, this disk image is fine. HxC is also able to interpret the data and the files are accessible.

But what if you want a clean read of the data without the extra empty data created by the stepping? It is not as simple as defining that the disk has 40 tracks, as the GreaseWeazle doesn’t inherently know the physical stepping difference between the two disks. However, a false positive does appear if you were to read the disk in this way:

gw read NameOfDisk.scp --drive=B --tracks=c=0-39

Press enter or click to view image in full size

HxC output reading a 40-track (Double Density) disk on an 80-track drive with 40 tracks specified on a GreaseWeazle for a raw SCP flux stream.

The read looks very similar to the first read on the 80-track drive, but it just has less tracks on it, as 40 tracks were defined. But this also means that only half of the data has been read because the 80-track drive reads the data at a much finer rate, and this is where the false positive comes in. When viewing the files on HxC, which can be done by using the Disk Browser in the main menu, all files and size of the files seem to appear.

Press enter or click to view image in full size

Disk Browser view of 40-track floppy disk on HxC Floppy Emulator. Just as a fun side note here, I love the content on this floppy disk as it really shows that software and more general files can be mixed on these carriers. This is not as common with more modern carriers, such as optical discs, where users cannot reuse the software discs for their own use. This is also why it is always worth checking all carriers in your collection, even if it states to just contain software.

This is the exact same view that you get when opening the Disk Browser for the first flux steam created on the 80-track drive, the one where no track was specified. But when you actually open one of the files, in this example GEOF.TXT, a difference can be seen between the two. One showing gibberish (on the left) to other showing readable, formatted text (on the right)

Press enter or click to view image in full size

GEOF.TXT extracted and opened from two different disk reads. On the left is the 40-track (Double Density) disk read on the 80-track drive with the tracks defined, on the left is the 40-track (Double Density) disk read on the 80-track with the default setting from the GreaseWeazle.

Why has this happened? Especially considering that the file directory gave the impression that everything was there, even the size of the files was included. This is because most of the important information on a floppy disk (for DOS-formatted disks at least), including the directory and other crucial information, is normally found on the first few tracks. But the actual data is missing here, as only half of the disk was imaged when using the 40 tracks setting on the GreaseWeazle.

Fixing this can be done by using the step function on the GreaseWeazle, which skips every other track, which gets rid of the red rings in the earlier images. However, it should be kept in mind that the first read of the 40-track (Double Density) disk on the 80-track drive with the default GreaseWeazle settings is also fine and usable, even if it has red sectors in it.

gw read NameOfDisk.img --drive=B --tracks=c=0-39:step=2

Press enter or click to view image in full size

HxC output reading a 40-track (Double Density) disk on an 80-track drive with 40 tracks and step function specified on a GreaseWeazle for a raw SCP flux stream.

One other pitfall

Using the 40-track drive a disk image was created as the formatting of the disk was known, which is the IBM 360K format. Using the same settings on the 80-track disk we get a disk image that looks good, with no discernable issues

gw read --format=ibm.360 NameOfDisk.img --drive=B

When using this command, the resulting disk image is completely green in HxC.

Press enter or click to view image in full size

HxC output of the ibm.360 format, created on an 80-track drive.

When creating specific formats on a GreaseWaezle, the log in the terminal can be really helpful to give an idea of what is actually happening.

Log output in terminal from creating the ibm.360 format on the 80-track drive using the GreaseWeazle.

The log shows that only 18 of the 720 expected sectors were found, but when viewing it in HxC it gives the impression that there are 0 bad sectors. Why is there such a mismatch? This is an inherent danger of creating sector-based disk images. As opposed to creating flux streams, when creating disk images data is always written to the sectors, but in this case instead of these being the actual disk content, it consists of the lines ‘BAD SECTOR’. So even if there is no actual data, HxC will pick this up as data, which is different from using the flux streams. This gets even more interesting when using the Disk Browser, which gives the exact same results as have been seen previously.

Press enter or click to view image in full size

Disk Browser view of the IBM.360 format created on the 80-track drive on HxC Floppy Emulator.

Again, the same problem as before has happened where the first few tracks did read, which is where the directory is stored. This highlights two pitfalls here, a green read in the HxC Track Analyzer and ‘files’ in the Disk Browser. But when the GEOF.TXT files is opened, this appears:

Press enter or click to view image in full size

Opened GEOF.TXT file from the IBM.360 format created on the 80-track drive

None of this is actual data. So to get a good disk image of a 40-track disk on an 80-track drive it is important to include the step function and tracks again:

gw read --format=ibm.360 NameOfDisk.img --drive=B –tracks=c=0-39:step=2

This will result in a readable and usable disk image from a 40-track disk on an 80-track drive.

Conclusion

In digital preservation, it’s often necessary to work with whatever drives are on hand, even if they aren’t the perfect match for the disks being read. While this can present challenges, it’s still possible to adapt and achieve successful reads across different hardware. The key is understanding that visual indicators like red or green on read outputs don’t always tell the whole story. Red areas might not always indicate a failure, and green zones might hide underlying data issues. Having as much context as possible about the disk’s format, layout, and history can make a big difference in interpreting the results more accurately.

From what I’ve seen so far, it generally seems preferable to use an 80-track drive, as it can read the most amount of disks. Using the default settings when capturing a flux stream on the GreaseWeazle give the most usable and flexible results. It’s usually safer to preserve more data upfront, even if it includes noise, since once something is missed or lost, it can’t be retrieved.

Going forward, I would really like to do more work in this area. There are still a lot of uncertainties and I would be interested in seeing what results we get from exploring our 3.5-inch drives or other image formats supported by the GreaseWeazle, like HFE or other sector-level captures.

Read Entire Article