Topics

Who has problems with Shingled Magnetic Recording (SMR) Hard Drives?


Ernst Lobsiger
 

Dear All

For some weeks now I am testing a new design of my TELLICast receivers. Instead of my 12 years
old Fujitsu/Siemens Esprimo P7935 (Processor Core2 Duo) I took a smaller 7 years DELL Optiplex
7010 SFF (Processor I5). In both receivers I use a WD-Blue 1TB hard drive as data disk. As the
DELL 7010 is SFF I use the 2.5'' HDD while in my minitower P7935 I have always used 3.5'' HDDs.

I have tmp and received data on Data HDD which has worked fine with GNU/Linux amd64 TC clients.

This is my problem: After a good start the DELL PC always runs into big trouble with I/O-Waits.
I have now tried all my tricks like ext4fs without journal and mount option noatime for speed.

I can only run the DELL receiver when I put tmp and received on a RAM disk and immediately
remove files when complete. This is not exactly what a TELLICast receiver should look like :-\ ...

Today I came accross a link that *might* reveal the cause of my problems. I noted that the new
WD-Blue 1TB 2.5'' HDD uses SMR while the WD-Blue 3.5'' 1TB HDD uses CMR recording technology.

SMR is rather new stuff and has not been disclosed in data sheets until recently. I found these links:

https://www.tomshardware.com/news/wd-lists-all-drives-slower-smr-techNOLOGY

https://toshiba.semicon-storage.com/ap-en/company/news/news-topics/2020/04/storage-20200428-1.html

I also studied the (software) Linux RAID Wiki and they say frankly forget SMR for RAID setups.

https://raid.wiki.kernel.org/index.php/Linux_Raid


I attach a timeline that shows how those I/O-Waits slowly build up and after days finally stall the bare DELL
receiver while my much weaker Fujitsu/Siemens even processes all Sentinel 3A+3B EFR images for month.


As we go towards MTG we will upgrade Data HDDs of sometimes even use a NAS as receiver. MY QUESTION:


Is there somebody out there that has had similar TC problems with consumer grade Data HDDs?
If so, can this person please check whether this could be due to SMR technology of this HDD?


Best regards
Ernst



Christoph Neuhaus
 

Dear Ernst,

TLDR: give dm-zoned a try.


Is there somebody out there that has had similar TC problems with consumer grade Data HDDs?
If so, can this person please check whether this could be due to SMR
technology of this HDD?
Unfortunately I cannot answer these questions directly. Instead I can offer some thoughts as a work colleague had the "privilege" to replace all the disks in a pretty huge NAS because of the WD SMR crap. And at the bottom you'll find two possible workarounds for your use case.


Before we dive into the mess that SMR is I assume that you checked the SMART values. In particular, the unload / load count is interesting. Modern hard disks, especially if they are designed for the use in mobile devices tend to have an aggressive energy management which leads to a lot of unload / load cycles. I guess your 2.5" disk falls into that category. In earlier times it was possible to tune some parameters with 'hdparm' even on consumer grade hardware - I don't know if this is still the case. Maybe it would help to turn off the energy management and see what happens.


The problem with the SMR technology is that it works fine only as long as you mainly have write once / read many data. If you want to use it for continuous and repeating write operations you are in a bad spot very quickly, all the more so if you want to overwrite old data.

The reason is the S in SMR: shingled. The data are put to the disks like shingles on a roof, and like those if you want to replace some the disk has to tamper with the surrounding shingles as well. Of course those SMR disks have a cache where they can put the old shingles that may not be overwritten, but if you write a lot of data that buffer will be exhausted resulting in long i/o wait states.

This leads to the reason why NAS manufacturers like Synology removed the WD RED disks with SMR technology from their compatibility lists even though WD advertises the disks as suited for the use in NAS devices. The NAS disk controllers react to overly long i/o waits and eject the disks from RAIDs, leading to a whole bunch of problems and work for users and admins.


I cannot say for sure if your issues are solely caused by the SMR disk, but due to the nature of the EUMETCast data stream it seems likely. Another factor might be ext4 as the filesystem does not know about the SMR "specialties". However, there are workarounds for your Debian / Devuan systems.

For once there is a new filesystem called ZoneFS which specialises on SMR disks (and other so-called zone based devices); unfortunately I don't know if it is available for Debian as it was intended to become a part of the 5.6 kernel.

A second solution is the 'dm-zoned' device mapper target. It adds an abstraction layer between hardware and filesystem / disk management system which hides and mitigates the constraints of SMR disks. Instead of using ext4 directly on the disk you format the disk with the 'dmzadm' tool first. This will give you a new block device that you can partition and format with ext4. Hope that helps!


Regards,
Christoph



Am 17.07.20 um 16:14 schrieb Ernst Lobsiger via groups.io:
Dear All
For some weeks now I am testing a new design of my TELLICast receivers. Instead of my 12 years
old Fujitsu/Siemens Esprimo P7935 (Processor Core2 Duo) I took a smaller 7 years DELL Optiplex
7010 SFF (Processor I5). In both receivers I use a WD-Blue 1TB hard drive as data disk. As the
DELL 7010 is SFF I use the 2.5'' HDD while in my minitower P7935 I have always used 3.5'' HDDs.
I have tmp and received data on Data HDD which has worked fine with GNU/Linux amd64 TC clients.
This is my problem: After a good start the DELL PC always runs into big trouble with I/O-Waits.
I have now tried all my tricks like ext4fs without journal and mount option noatime for speed.
I can only run the DELL receiver when I put tmp and received on a RAM disk and immediately
remove files when complete. This is not exactly what a TELLICast receiver should look like :-\ ...
Today I came accross a link that *might* reveal the cause of my problems. I noted that the new
WD-Blue 1TB 2.5'' HDD uses SMR while the WD-Blue 3.5'' 1TB HDD uses CMR recording technology.
SMR is rather new stuff and has not been disclosed in data sheets until recently. I found these links:
https://www.tomshardware.com/news/wd-lists-all-drives-slower-smr-techNOLOGY
https://toshiba.semicon-storage.com/ap-en/company/news/news-topics/2020/04/storage-20200428-1.html
I also studied the (software) Linux RAID Wiki and they say frankly forget SMR for RAID setups.
https://raid.wiki.kernel.org/index.php/Linux_Raid
I attach a timeline that shows how those I/O-Waits slowly build up and after days finally stall the bare DELL
receiver while my much weaker Fujitsu/Siemens even processes all Sentinel 3A+3B EFR images for month.
As we go towards MTG we will upgrade Data HDDs of sometimes even use a NAS as receiver. MY QUESTION:
Is there somebody out there that has had similar TC problems with consumer grade Data HDDs?
If so, can this person please check whether this could be due to SMR technology of this HDD?
Best regards
Ernst

--
_____________________________________________

University of Bern
Department of Geography
Remote Sensing Research Group

Christoph Neuhaus
ICT Expert

Hallerstrasse 12
3012 Bern - Switzerland

mailto:christoph.neuhaus@...
skype: nihil14
http://www.geography.unibe.ch/remotesensing
_____________________________________________


Ernst Lobsiger
 

On Fri, Jul 17, 2020 at 09:21 AM, Christoph Neuhaus wrote:
This leads to the reason why NAS manufacturers like Synology removed the WD RED disks with SMR technology from their compatibility lists even though WD advertises the disks as suited for the use in NAS devices. The NAS disk controllers react to overly long i/o waits and eject the disks from RAIDs, leading to a whole bunch of problems and work for users and admins.
Hello Christoph

Nice to meet you here again. O.K. it seems with 3 months delay I came accross a little WD scandal
and now find more links on heise.de and also people complaining that have bought WD RED drives for
a NAS on digitec.ch and are now stuck. My receivers are single cable with 3 TC clients writing and
overwriting randomly about 400GB per day. I noticed that my WD Blue 2.5'' has 128MB cache while my
WD Blue 3.5'' has 64MB. Christian Peters is lucky to operate a WD Red 2.5'' 1TB (still with CMR)!

Markus Kempf has built a Basic/HVS-1 receiver using a Debian based NAS with some help of me lately.

https://groups.io/g/MSG-1/topic/71144718#29134

But this is *not* only a problem of NAS but seems essential to all TC receivers I can think of.
So this thread will be of interest to many EUMETCast users that are about to buy new data HDDs.
TelliCast reception is certainly not storing blockbusters or music once and then consuming it.

I am currently testing new TBS PCIe hardware. So we will make comparisons with the pure receive on
RAM disk and throw away receivers not to interfere in any way with I/O-Waits. Then I will probably
build my next generation receivers with PCs that have room for one 2.5'' OS SSD + two 3.5'' data HDDs
that I'll put in software Raid0. Here is a cheap 2GB 3.5'' HDD that still seems to work with CMR recording.

https://www.microspot.ch/de/computer-gaming/speicher-laufwerke/hdd-festplatten--c561000/toshiba-p300-hd-sata-6gb-s-2-tb--p0001061777


Best regards
Ernst


Markus Kempf
 

Ernst,

I choose to use a cheap 1TB SSD ~80Euros. Works since about 5 months. I guess it will last for >2 years. Then I can buy another one for even less money. The data products (and everything else) are stored on two 10 TB HDD’s made by Toshiba.

Markus
Am 17. Juli 2020, 20:12 +0200 schrieb Ernst Lobsiger via groups.io <ernst.lobsiger@...>:

On Fri, Jul 17, 2020 at 09:21 AM, Christoph Neuhaus wrote:
This leads to the reason why NAS manufacturers like Synology removed the WD RED disks with SMR technology from their compatibility lists even though WD advertises the disks as suited for the use in NAS devices. The NAS disk controllers react to overly long i/o waits and eject the disks from RAIDs, leading to a whole bunch of problems and work for users and admins.
Hello Christoph

Nice to meet you here again. O.K. it seems with 3 months delay I came accross a little WD scandal
and now find more links on heise.de and also people complaining that have bought WD RED drives for
a NAS on digitec.ch and are now stuck. My receivers are single cable with 3 TC clients writing and
overwriting randomly about 400GB per day. I noticed that my WD Blue 2.5'' has 128MB cache while my
WD Blue 3.5'' has 64MB. Christian Peters is lucky to operate a WD Red 2.5'' 1TB (still with CMR)!

Markus Kempf has built a Basic/HVS-1 receiver using a Debian based NAS with some help of me lately.

https://groups.io/g/MSG-1/topic/71144718#29134

But this is *not* only a problem of NAS but seems essential to all TC receivers I can think of.
So this thread will be of interest to many EUMETCast users that are about to buy new data HDDs.
TelliCast reception is certainly not storing blockbusters or music once and then consuming it.

I am currently testing new TBS PCIe hardware. So we will make comparisons with the pure receive on
RAM disk and throw away receivers not to interfere in any way with I/O-Waits. Then I will probably
build my next generation receivers with PCs that have room for one 2.5'' OS SSD + two 3.5'' data HDDs
that I'll put in software Raid0. Here is a cheap 2GB 3.5'' HDD that still seems to work with CMR recording.

https://www.microspot.ch/de/computer-gaming/speicher-laufwerke/hdd-festplatten--c561000/toshiba-p300-hd-sata-6gb-s-2-tb--p0001061777


Best regards
Ernst


Ernst Lobsiger
 

On Fri, Jul 17, 2020 at 12:36 PM, Markus Kempf wrote:
I choose to use a cheap 1TB SSD ~80Euros. Works since about 5 months. I guess it will last for >2 years.
Hello Christoph and Markus

I installed (monster) smartmontools and did a smartctl -a on both /dev/sda and /dev/sdb.
TBH I nomally just use skdump and sktest from (slim) libatasmart-bin and it is out of my
"comfort zone" to peek and poke stuff in drives even unknown to the smartctl database.
In my DELL Optiplex 7010 SFF I found no BIOS setting for disk power management either.

So it seems the only reliable solution today (proven by Christian Peter's Smaug receiver)
is the WD Red 2.5'' 1TB HDD (WD10JFCX) that is available in Switzerland for some CHF 80.-.
For future use a closer look showed the CHF 70.- Toshiba P300 3.5'' 3TB is still with- CMR.

Not so long ago David Taylor did some investigations regarding SSDs for tmp and received.
Up to now this was not recommended due to excessive and rather fast wear in T1+T2 TelliCast
receivers that easily gather up to 500GB per day. Have SSDs become much better meanwhile?


Regards
Ernst