Designing a Power-Down Tester for Solid-State Storage Products Used in Portable Medical Devices
Portable, space-saving medical devices are the trend in today’s
health care industry. Several popular storage solutions have been adopted by medical
equipment manufacturers, including solid-state storage due to its space and
power savings characteristics. Solid-state drives (SSDs) offer, in most cases,
significant power savings which translate into longer battery life for mobile
applications. This also makes them especially well-suited for portable medical
As new applications for solid-state storage continue to increase, it is important for embedded system designers to be able to characterize drive performance under varying power conditions. Other storage design considerations concern how power anomalies effect data integrity with the common practice of medical devices suddenly switched off and on, frequent alternation between AC and battery backup power, as well as encountering the spikes, surges, brownouts and blackouts common in most commercial power supplies.
A robust test platform is necessary in order to generate a wide range of test conditions that can be accurately repeated. Since solid-state storage has no moving parts, it actually performs better in environmentally rugged applications as opposed to rotating hard drives, whose precision mechanics simply cannot take the rigors and duty cycles of most embedded systems. Table 1 below compares solid-state storage and rotating hard drives.
Medical device designers are now realizing that SSDs offer several advantages over hard drives, but they must consider how SSDs will operate in medical environments with varying power input stability. The possibility of a power disturbance in medical applications must be considered since a large number of embedded systems in medical devices run on battery power. If the host system loses power in the middle of a write operation, critical system files may be overwritten or sector errors may result, causing the drive to fail.
In this article, you will find a test platform designers can use to evaluate the performance of a number of solid-state storage products under various power disturbance conditions. Engineers will now be equipped with a necessary tool to determine the criteria needed to meet medical device deployment requirements.
Medical systems often operate in less-than-ideal power conditions. They can experience power disturbances ranging from spikes to brownouts, which can cause a significant amount of data and drive corruption, leading to field failures and potential loss of revenue from equipment returns.
Data is typically read and written by the host system in increments of at least 512 bytes called sectors. Data corruption can result if there is a power disturbance during the sector write operation. On subsequent power-ups, a read sector error may occur when the sector is read. The read sector error occurs due to a mismatch between the data in the sector and the error checking information for that sector. Typically in solid-state drives, the sector will be replaced by a spare on the next write. If this situation goes unchecked, the SSD will cease to operate once all spares are depleted.
File System Analogy:
A read sector error is analogous to a bad cluster after running Scandisk. Once a cluster is labeled bad, it is unusable in the file system. Similarly a solid-state drive views sectors with read errors as a sector which must be replaced with spares. The difference in these two scenarios is that after running Scandisk, if there are bad clusters, the drive becomes smaller. In a solid-state drive, spares get used. When all spares are exhausted, on the next read error, the drive ceases to operate.
Figure 1 shows the relationship between sectors and files. If a power disturbance occurs during a large file write, the file may be truncated, but it should be truncated at a sector boundary for the storage device to be considered “robust.” In this situation, file corruption may have occurred, but no spares have been used. Significant problems described below will result if the corruption occurs in the middle of a 512-byte sector or during the FAT table update.
Below are several examples of failures:
- When a read sector error occurs, many applications will automatically produce a system-level error and will result in system downtime until the error is corrected and the cause of the error is defined and eliminated.
- An unexpected power event occurs during the middle of a write operation, resulting in a read sector error.
- In addition to losing data, a solid-state drive may interpret additional read sector errors as a defective sector and may unnecessarily replace it with a spare sector. Once the number ofa factory-defined spare sectors decreases to zero, the solid-state drive becomes obsolete and needs replacement.
- A brownout or low power condition can cause the address lines of the storage device to reach an unstable state. If this occurs, and write commands are still being accepted by the drive, data can inadvertently be written to the wrong location, corrupting critical system files. This also can result in a critical failure requiring the unit to be replaced.
Additionally, power disturbances that cause data and drive corruption can affect host systems differently, depending on where and when they occur. The following provides an overview of the different types of files that can be corrupted and how they impact the host:
- Critical System Files - When using an operating system, loss of one or more of the critical system files (config.sys, system.ini, etc. depending on the operating system used) will create system errors or render the drive unbootable. Once again, this situation causes unscheduled downtime and may require that the drive be reformatted and the operating system be re-installed.
- Master Boot Record (MBR) - The MBR is a single-sector file which is always located at logical block zero. It must be repaired if it becomes corrupted, or it will be impossible to access any volumes on the drive. Corruption of the MBR will render the drive inoperable. In order to re-use the drive, it will need to be re-partitioned and re-formatted.
- File Allocation Table (FAT) - Corruption of the File Allocation Table (FAT) will cause the user to lose access to the files on the drive. All data will be lost, and the drive must be reformatted.
- Corruption of User Data Files - User data files also can be corrupted. This often occurs without the user being aware of it until it is too late. After a power disturbance has occurred, it is recommended that the entire drive is checked and any sector errors repaired.
Possible causes of file corruption include:
- Address lines that float to undetermined states – If input power drops below the specified minimum operating level, but there is still enough power to program the solid-state storage component, data could be written to the wrong address.
- Inadvertent operating system overwrite – This is due to system level instability brought on by a loss of power.
With either of these corruption situations, it is critical to understand that the SSD is not physically damaged or “worn out.” In the case of critical file overwrite, the host can re-partition and re-format the drive. In the scenario of running out of spares, the product must be returned to the vendor for failure analysis. Once the failure is confirmed, the vendor must re-initialize the drive as part of its failure analysis process. After re-initialization and when no problems are found, the units are returned to the user. In both cases, the data will be lost, and downtime will occur. However, the drive can be used again.
In today’s medical devices, technology is required to reduce the impact of an abrupt system power-down. SSDs in medical systems are exposed to many forms of power disruption and interference that can cause voltage spikes, brownouts and complete loss of system power. For this reason, many companies have included power-down testing in their qualification processes for any new storage media they are considering. The ability to handle the varied power problems listed above has a direct impact on product reliability, customer goodwill and total cost of ownership.
Not all SSDs have been engineered the same. Advanced new SSDs have been integrated with various technologies based on the requirements of the system. If a voltage threshold has been reached, the solid-state drive can send a busy command to the host so that no more commands are received until the power level stabilizes. Integrated voltage detection circuitry provides an “early warning” of a possible power anomaly. Address lines can be latched as shown in Figure 2 to ensure that data is written to the proper location.
Cache and buffer sizes can be made as small as possible to minimize the time data is in volatile memory state, thereby reducing the power and time required to clear the buffer, update the control bytes and complete the data transaction.
Additional capacitance can be added to the solid-state drive to allow more time to clear larger buffers. This scenario is in general only valid for physically larger form factors.
Many factors must be considered when designing a power-down tester including:
- Ramp rate of power-down on power and all I/O pins - The steeper the ramp, the faster the voltage will fall below the storage components’ minimum programming voltage, resulting in a greater chance for error. The ramp rates of various systems are shown in Figure 3 below. The rates for the standard desktop PC and the embedded system emulate the loss of system power. The power-down tester should take into account the steeper ramp rate that emulates pulling out the solid-state drive while writing. Ramp times vary and are highly application-specific. A power-down tester should be designed to validate the worst-case scenario.
- Power-down timing - A complete power-down test requires power to be removed at various intervals of the write cycle. The ability to strictly control and repeat this variable will expose any weaknesses in the write cycle and will accelerate any failures.
- Cut power to I/O and Vcc at the same time - It is important to eliminate any Vcc feedback that may occur if power is cut to Vcc and not to the I/O. Many test platforms only cut power to Vcc, resulting in invalid test data and potential damage to the storage device.
- Validation and verification - Read sector errors may occur at any point, yet the storage device may continue to operate. The worst-case scenario is that the first read sector error occurs in a critical system file. Designers may choose to run the test until the drive becomes inoperable. The amount of verification performed will directly impact the test time and testing after every cycle could be prohibitive.
Figure 4 illustrates a sample tester flow diagram:
The following is a description of each stage of the flow diagram:
- Write to unit under test (UUT) - Write a single sector using ANSI ATA write command. This isolates the error down to the rawest form. (Sample code can be found in Appendix A below.)
- Timer to shutdown - Shutdown granularity is 1µs. This allows for variance of the shutdown sequence to power-down at varying points in the write cycle.
- Shutdown UUT - Will make sure all proper power and signals are shut off to simulate worst case scenario.
- Delay – Plan for at least 10ms to allow any stray capacitance to discharge.
- Check for Read Errors - Scan previously written sector using ANSI ATA read command and confirm the read sector status. A status of 0x51 constitutes a read sector error. The test code should track and monitor the number of cycles to the first failure and the number of cycles versus the number of failures.
- Validate data in other sectors - Ensure that data was not written to another location by mistake. The amount of validation required will significantly impact test time.
The FPGA block in the above diagram provides all logic and timing for program flow. It contains the CPU, timer, IDE controller, serial interface, and power control unit. The CPU manages the data flow between all the sub-units within the FPGA. The timer controls the power ON/OFF timing with 1?s resolution. The IDE controller provides I/O buffers for address, data, and control lines for the unit under test. The power control unit sends the control signals to the on board power and analog switches. The serial interface provides a full duplex communication channel between the FPGA and a host PC for receiving user commands and displaying the tester’s status.
In order to simulate a real power-down scenario, the transceivers to the address, data, and control signals should be disconnected from the unit under test. The Analog Switches have an on-resistor of 50m? and an off-resistor of several M?. The Power Switch Unit turns the supply voltage to the unit under test (on and off), according to the timer unit within the FPGA. The DC/DC Converter converts the 16V DC input voltage to the required system core voltages. The PC provides a generic hyper-terminal, which sends commands to the tester and also displays the current status of the tester.
Overcoming Power Disturbance Issues in SSDs
It is possible to eliminate errors related to power disturbances which will enable designers to fully realize the advantages of implementing SSDs in medical devices. These drives meet healthcare industry demands for small size and low power consumption while exponentially increasing the reliability in terms of physical viability and data security. The benefits of utilizing this technology include improved performance, reduced downtime and overall reduced total cost of storage ownership, which translate into improved patient care. Designers, too, will find the test platform a good starting point to optimize their application by maximizing drive performance under varying power conditions and meeting system deployment requirements for solid-state storage in a wide range of medical devices.
Gary Drossel is the Vice President of Product Planning at SiliconSystems.