Everything standardized, nothing works!
In some setups, mostly big storage arrays, SATA drives are prevented from spinning up automatically. There are many standards and lots of technologies to controll this behaviour. Unfortunately, I learned about all of them.
A few years ago I got a new home server which just did not work as I expected. This article will follow the different topics I learned about during this endeavour.
I try to keep all paragraphs separated from each other so that you can skip one if your not interested.
The basic problem
As described in this post I used an APU2c4
and a Intertech chassis
as homeserver and added additional SATA controllers to be able to use 4 to 5 disks.
After realising that controllers with an ASM1061 chip won’t work reliable with an APU Board,
I bought one based on a Marvell 88SE9215.
So I got the disks, the backplane with its SAS connector, a cable adapting to 4x SATA and the controller with 4x SATA.
I put everything together and surprise: It didn’t work, the disks did not turn on!
I tested them with a USB adapter, they worked fine. Even with just the power cable connected they spinup immediately. So what’s going on here?!
So I started digging around, what could it be. First thing that comes to mind is the backplane. Even if the power cable was connected to the backplane, the drives did not spin up.
The backplane has a 4 pin molex power connector supplying 12V and 5V. So I took my multimeter and checked for the SATA pinout. Both voltages are available on the disk, the 3.3V pins where at 0V but they aren’t used in disks anyway. I realised that only one 12V Pin is connected, but hey should be enough eh?
The backplane also has two LEDs per disk to show status and activity. If a disk is connected the blue status LED lights up, so the backplane registers a device presence. With some older disk additionally the green activity LED blinks once. This LED is wired together with the staggered spinup pin (PIN11 of the power connector) which is, to my research, normal.
So this pin is high on my backplane with no drive connected and somewhere around ~3.3V with a connected drive. Looks like a pull up resistor on the backplane tells the drives to enable staggered spin up. With staggered spinup enabled the wait for a specific SATA command before they start the motor. At least this explains why the disks aren’t spinning up when connected to the backplane.
In the end I could not determine any obvious fatal error or manufacturing defect on the backplane.
The connector on the backplane is a SFF-8087 connector (also called miniSAS, or internal SAS) which contains 4 Rx/Tx pairs with their related ground lines. Additionally there are pins for a 4 pin sideband connection which can be used by controllers to control the backplane (eg. for lighting up error or replacement LEDS). I also checked these pins on my backplane but could not measure anything on them.
So far so good, this connector can serve 4 distinct SAS or SATA lanes, but SAS and SATA don’t just differ in their commands and protocol but also in their pinout.
SAS controller and devices have the same Rx/Tx pinout on their connectors and the swap is done in the cable (crossover).
However SATA controllers and devices have switched Rx/Tx pins on their connectors and the cable is straight through.
And I now have a SATA disk, a backplane which supports both, an SATA to SFF-8087 SAS cable and a SATA controller, well fuck.
After spending oceans of time researching I could figure out that there are two different types of that adapter cables with a lot of different names:
the OCR cable
which connects a backplane with SFF-8087 connector to a controller with four SATA ports, that’s what I need.
This is also called
- Reverse Fanout
- Reverse Breakout
- Straight Through
the OCF cable
which connects a SAS controller to a backplane or drive with SATA ports.
This is also called
And yes, a third of all online shops got it just wrong or didn’t get it at all. And Yes, I bought 3 cables, two were wrong, one was the right one.
Each around 16-20€.
I connected everything and guess what: The drive did not spinup!
Staggered Spinup (SSU)
Now that I’m absolutely sure that I got the right cable I have to search the error somewhere else. So maybe the disk is somehow told to wait with the spinup. As mentioned before there is a special pin for signaling the drives to do staggered spinup.
The 11th pin on the SATA power connector is not hardwired to its neighbors and is reserved for staggered spinup and activity signaling. So if pin 11 is connected to ground, which is done by a resistor on the disk, then the drive will start up immediately (normal mode). But if the pin is pulled to high, when the drive is connects, it won’t spin up until a SATA link is established and the controller sends an appropriate command.
I was frustrated enough to risk loosing a disk to get this thing working and so I just soldered the pins together and well what to say the drive spun up! “Holy cow I fixed it” I thought, went to my computer and realized that the drive was still not listed in Linux. Instead the kernel told me:
[ 1126.507176] ata4: COMRESET failed (errno=-32) Dec 18 07:34:38 archiso kernel: ata4: COMRESET failed (errno=-32) Dec 18 07:34:38 archiso kernel: ata4: reset failed (errno=-32), retrying in 8 secs Dec 18 07:34:47 archiso kernel: ata4: SATA link down (SStatus 0 SControl 300) [ 1135.653828] ata3: COMRESET failed (errno=-32) Dec 18 07:34:47 archiso kernel: ata3: COMRESET failed (errno=-32) Dec 18 07:34:47 archiso kernel: ata3: reset failed (errno=-32), retrying in 8 secs Dec 18 07:34:56 archiso kernel: ata3: SATA link down (SStatus 0 SControl 300)
These messages indicates that even the COMRESET at the SATA link initialization failed. But at least something tries to connect here at all. There must be something else which keeps the drive from talking to the controller…
Power Up In Standby (PUIS)
Ahh the internet says there is another power down feature for SATA disks. It’s called Power Up in Standby and it means that the drive is set into a mode where it does not power up when it’s plugged in no matter what state the SSU pin is in. Alright so another thing to fix then we’re good to go. Some SATA drives have a jumper to set the PUIS mode, I actually found an old one in my stash and with the jumper set, the drive stays quiet when power is attached. Sadly the drives I want to use in the server don’t have jumper pins to set PUIS. But you can also read this feature via software (like hdparm).
hdparm -I /dev/sdX | grep -i -B 1 power-up * SMART feature set Power-Up In Standby feature set
The missing * at the beginning of line indicates, that this feature is not set!!
With the following commands one can control the PUIS feature flag:
- disable PUIS:
sudo sg_sat_set_features -f 0x86 /dev/sdX --verbose\
- enable PUIS:
sudo sg_sat_set_features -f 0x06 /dev/sdX --verbose\
I tested it with the old drive and as expected it has the same effect as the jumper setting but the feature could neither be enabled nor disabled on the drives I want to use.
Power disable (PWDIS)
Well believe it or not but since SATA Rev 3.2+ there is another power feature called Power disable (PWDIS).
Like staggered spinup it ist controlled via a pin on the SATA power connector
in this case Pin 3 which was one of three 3.3V pin in earlier SATA revisions.
See tom’s HARDWARE
for further details.
I verified that this pin is not connected to the other two 3.3V pins (like on older disks) and then I started the whole endeavour again. I taped the pin, I soldered it to ground, I shorted it to ground after the disk was connected. Then I did all that in various combinations with both the SSU and the PWDIS pin.
The only measurable effect was some magic smoke leaving the backplane indicating that one of the activity LEDs has left our solar system. All four drives still not spinning up.
The cable, again
Still my mind told me that after all it is most likely that the cable must be the culprit. I searched for replacement in all offices and workshops I could think of and got a bunch of HP branded 2x SATA -> SAS adapter cables. With this the drives spun up and are recognized by the host system. Oh how crazy is that, was it the wrong cable after all? Well with the HP cable I could only use the first two drives but that’s at least something. I used the machine for more then a year with this setup till I ran out of storage. Anyway if two lanes work, there must be a solution for four SATA lanes. I wanted to understand what the difference between the working and the non-working cable is and so I ripped one of each type open.
These pictures show the PCB of a noname 4 lane cable (left) and a 2 lane HP cable (right). As you can see the layout is the same. In the center there are the sideband pins and on the each side there is a SATA dataline pair surrounded by GND. The HP cable had the sideband connected, I desoldered them, nothing changed. Also all wires had connection from one end to another.
As to be expected, no further noticeable difference between them.
The project was in such a horrible state that the occasionally heavy drinking started to become occasionally more often!
At last on the list was the controller, although I seriously doubted that it could be the root cause.
Nevertheless I checked my options.
As mentioned above I use a Marvell 88SE9215 which has no raid capabilities or something like that. The tooling provided by Marvell can only handle their raid controllers and did not even recognize mine. So I spent a night or two going through everything the AHCI driver exposes via sysfs about that controller. Then I read the source code of the AHCI driver and found a Marvell specific option Additionally there is another driver for these controllers which I could try, but honestly I didn’t even want to anymore.
While waiting for the next best time to checkout some other combinations of parameters I talked to a lot of people about this mess.
Most of them avoided successfully to get dragged into this, some other lend me hardware for testing. My partner then told me that she likes the idea of fixing such a problem with money and what would be my options on that. So answered more
or less sarcastically “Well, I could get a cable from a considerable brand which costs 4 times as much”
- “Then you should try that!”
So I paid 45€ (!) for a SATA->SAS cable with all the schibberish I already told you about, plugged it in and it just worked. And it still does…