Adding new formats

If you need assistance with new formats try MFM discussion group

If --analyze doesn't find the format, it will need to be added. Easiest solution is if you can find a manual that describes the sector format. Very few give the complete information such as the CRC polynomial. If I can't find the format the steps I use are:

  1. Read disk into transitions file without decoding.
  2. Find what marks the start of the sector header.
  3. Find out how to extract the cylinder, head, sector, and other information from the sector header.
  4. Find how data start is marked and what header bytes are before the data.
  5. Modify the code to implement the format determined.
  6. Find the CRC polynomial, initial value, and what bytes are included.
  7. Find the various gap fill byte values and counts needed for ext2emu.

1 Read disk into transitions file without decoding

Read the disk into a transitions file without trying to decode. This way you not messing with the drive you want while trying to decode the format.

./mfm_read --transitions_file ../disk.tran --cylinders # --heads # --drive #

2 Find sector start mark

The most common way of marking the start of the sector header is a special 0xa1 pattern. Only this mark method will be covered in this document. The command below will determine if the pattern is used. This ignores the CRC for now. You will need to specify the proper file names.

mfm_util --format WD_3B1 --tran disk.tran --ext /tmp/out --quiet 1 --data_crc 0,0,32,0 --header_crc 0,0,32,0
For SA1000 type 8" disks use --format unknown2 instead.

If it prints Got exp messages for most tracks then it found the sector flag byte.

3 Find header information

Change the --quiet 1 to --quiet 0. Look at the data and see from the bytes dumped what looks like the cylinder, head, sector, and what the other bytes might mean. If the bytes don't make any sense try using format Xebec_104527_512B. The Xebec format have another sync after the 0xA1 so won't be properly synchronized with the data with WD_3B1. If neither of these match then much more effort is needed which will not be covered in this document.

There is logic to handle various header flags if sufficient information is available to figure how how they are encoded. Look for bits that change in the header that aren't the cylinder etc. values and see if you can figure out what they are likely to encode. See bad_block, alt_assigned and is_alternate in the existing code to see how they are handled.

Look for bytes that change at the end randomly to find the CRC length. When write turns off it generates false bits that can cause bytes with various values after the real CRC but they tend to not look random. You can increase the ,32 in the CRC to ,64 to see more trailing bytes.

4 Find data information

Hopefully reasonable looking data will also be displayed with the --quiet 0. The Xebec formats don't have a separate 0xa1 sync for the data so if data area not found try using format Xebec_104527_512B. Same look at the data and find the header bytes, sector data length, and CRC. Try various --sector_length to determine what the sector length is.

5 Modify code

Find the closest existing format in wd_mfm_decoder.c or xebec_mfm_decoder.c and copy it to use as the starting point for the new format. Also add to the comments at the top the description of the format like the others. For example of the changes to add a couple new formats see the changes between version 2.18 and 2.19

Copy the closest match in the CONTROLLER data structure in inc/mfm_decoder.h. Note that entries need to be sorted by header_bytes. Add in inc/mfm_decoder.h the new controller enum value to the controller enum in the same location that you put the new entry in CONTROLLER structure.

There are various other fields that need to be set correctly such as write_first_sector_number. Some controllers number the first sector 0 and others 1. Error logical sector -1 out of range indicates first sector 1 but should be 0. Others that may need to change are write_sector_size, header_bytes, data_header_bytes, header_crc_ignore, data_crc_ignore, write_header_crc, and write_data_crc. Stranger formats may require others to be changed.

Also need to add new controller to mfm_decoder.c to call the correct routine. See the current code.

6 Find CRC parameters

I'm putting new entries in inc/mfm_decoder.h in with CONT_MODEL where I define all the parameters including the CRC. CONT_MODEL only uses the specified parameters mfm_decoder.h when searching with mfm_read/mfm_util --analyze. CONT_ANALYZE tries to find the CRC and other values. If format properly added when --format is specified with only files to process it should decode a transitions or emulation file. See other CONT_MODEL for fields normally filled in. CONT_MODEL formats don't need the polynomial etc. to be specified. CONT_ANALYZE does. As the number of formats grew the CONT_ANALYZE search was getting too long. For finding CRC after you have the other code changes in you can change to CONT_ANALYZE and use mfm_util --analyze and it will try to find the CRC. You can then put the values in and change it back to CONT_MODEL. You may need to change header_crc_ignore, data_crc_ignore to ignore first byte or two for the CRC to be found. Which bytes are included in the CRC varies between controllers.

If --analyze doesn't work you can change the define at the top of mfm_decoder.c to dump the header or data for use in find_crc_info to find the polynomial and initial value. You must have the correct CRC length. Find_crc_info is in the mfm directory. It can be built on faster Unix machines.

./find_crc_info total_bytes crc_bytes > dumpdata
Total_bytes is header bytes + sector bytes + CRC bytes. Header will always have patterns that work to decode CRC polynomial. Data may not.

If data CRC not found with above get the data to use with reveng.

mfm_util --tran disk.tran --format SHUGART_1610 --ext /tmp/t --quiet 0 > /tmp/tt

First line of header data is the hex values
Got exp 0,0 cyl 0 head 0 sector 0,0 size 1024 bad block 0
0 0 1:0xa1,0xd9,0x8b,0x58,0x83,0xec,0x04,0xec,0xc8,0x8c,0x20,0x2d,0x05,0x00 ,0x00,0xbe,

Edit /tmp/tt and extract two sectors with different contents and write the data to two files. Remove # # #: on the first line. For this example the data was written to /tmp/h1 and /tmp/h2. Use reveng. First convert the hex data to binary.

xxd -i -r -p /tmp/h1 > /tmp/h1b
xxd -i -r -p /tmp/h2 > /tmp/h2b
Set -w to correct size.
reveng -w 32 -b -s -f /tmp/h1b /tmp/h2b

32 bit brute force typically takes an hour but can be 16 hours if all polynomials need to be searched on somewhat old laptop. To first search common polynomials add -p 00a00805 -q 00a00805. Other common values 140a0445, 104c981. If the initial value isn't simple such as 0 or all 1's you can try removing the 0xa1 byte and see if that helps. Some controllers don't include it in the CRC. Some don't include the first two bytes.

7 Add ext2emu support

If you have a manual describing the track format with the gaps you can create the TRK_L data for the format and put pointer to it in track_layout in the CONTROLLER data. How to determine this information is too complex for describing currently. There is various disabled code I use for finding the gap length and values. Some is controlled by PRINT_SPACING in mfm_decoder.c. If not supporting put NULL in for track_layout.

Easiest way to create the TRK_L is to copy closest existing and modify. Run ext2emu and fix the data to get rid of the errors and warnings. Then run mfm_util to decode the data and verify no errors and output data matches input.



Feel free to contact me, David Gesswein djg@pdp8online.com with any questions, comments on the web site, or if you have related equipment, documentation, software etc. you are willing to part with.  I am interested in anything PDP-8 related, computers, peripherals used with them, DEC or third party, or documentation. 

PDP-8 Home Page   PDP-8 Site Map   PDP-8 Site Search