{"id":7463,"date":"2014-06-19T03:59:45","date_gmt":"2014-06-19T03:59:45","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/06\/19\/hard-disk-very-slow-failing-with-more-and-more-errors-collection-of-common-programming-errors\/"},"modified":"2014-06-19T03:59:45","modified_gmt":"2014-06-19T03:59:45","slug":"hard-disk-very-slow-failing-with-more-and-more-errors-collection-of-common-programming-errors","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/06\/19\/hard-disk-very-slow-failing-with-more-and-more-errors-collection-of-common-programming-errors\/","title":{"rendered":"Hard disk very slow, failing with more and more errors-Collection of common programming errors"},"content":{"rendered":"<p>Since a couple days, my Seagate Momentus 7200.4 has been failing more and more, possibly because of a power outage. After the &#8220;WARNING: Your hard drive is failing&#8221; (I&#8217;m using fedora), the main symptom was the slowness: constant 100 % CPU wait for hours, almost impossible to do anything. I made a backup, then I restarted and I had to do an e2fsck -y (lots of output), which I had to repeat later (didn&#8217;t even boot at some point, kernel panic), I did some smartctl tests long and short, I left it alone for a night to its sector correcting or whatever.<\/p>\n<p>Now the number of errors accumulating seems lower and the computer is mostly usable, but what should I do: is there some fsck command with better effects, or some other way to make it skip the bad sectors and keep functioning, other than fixing the sectors one by one with hdparm? Or is the drive surely to be trashed?<\/p>\n<p>Excerpts from smartctl -x \/dev\/sda :<\/p>\n<pre><code>=== START OF READ SMART DATA SECTION ===\nSMART overall-health self-assessment test result: PASSED\n\nVendor Specific SMART Attributes with Thresholds:\nID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE\n  1 Raw_Read_Error_Rate     POSR--   085   074   006    -    243348742\n  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    0\n  7 Seek_Error_Rate         POSR--   084   060   030    -    238612361\n  9 Power_On_Hours          -O--CK   087   087   000    -    11535\n198 Offline_Uncorrectable   ----C-   100   100   000    -    8\n199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0\n240 Head_Flying_Hours       ------   100   253   000    -    132680129719553\n241 Total_LBAs_Written      ------   100   253   000    -    2525013242\n242 Total_LBAs_Read         ------   100   253   000    -    2162196433\n\nError 3759 [18] occurred at disk power-on lifetime: 11535 hours (480 days + 15 hours)\n  When the command that caused the error occurred, the device was active or idle.\n\n  After command completion occurred, registers were:\n  ER -- ST COUNT  LBA_48  LH LM LL DV DC\n  -- -- -- == -- == == == -- -- -- -- --\n  40 -- 51 00 00 00 22 7e 00 3d 2a 00 00  Error: UNC at LBA = 0x227e003d2a = 148142832938\n\n  Commands leading to the command that caused the error were:\n  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command\/Feature_Name\n  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------\n  60 00 00 00 08 00 22 7e 00 3d 28 40 00     18:38:24.892  READ FPDMA QUEUED\n  27 00 00 00 00 00 00 00 00 00 00 e0 00     18:38:24.891  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]\n  ec 00 00 00 00 00 00 00 00 00 00 a0 00     18:38:24.889  IDENTIFY DEVICE\n  ef 00 03 00 46 00 00 00 00 00 00 a0 00     18:38:24.889  SET FEATURES [Set transfer mode]\n  27 00 00 00 00 00 00 00 00 00 00 e0 00     18:38:24.889  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]\n\n\nSMART Extended Self-test Log Version: 1 (1 sectors)\nNum  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error\n# 1  Extended offline    Completed: read failure       90%     11528         574443398\n<\/code><\/pre>\n<p>More: http:\/\/p.defau.lt\/?DTSGCmr7mb_anDD3IQ9Bgg http:\/\/p.defau.lt\/?hNM7_BusGyz4DYLi9XX0Kg http:\/\/p.defau.lt\/?wQArANAXPLnpyD87xUY6CA http:\/\/p.defau.lt\/?hXbtLh27yFZhySu0y9axJw<\/p>\n<p><strong>Update<\/strong>: as you said the disk is to be trashed already, I did dmesg | grep -oE &#8220;sector.+$&#8221; | sort -u and I sudo hdparm &#8211;write-sector &#8211;yes-i-know-what-i-am-doing &#8216;d a dozen sectors. Now running another test, let&#8217;s see what comes out of it.<\/p>\n<p><strong>Update 2<\/strong>: I had to fix some more bad sectors with hdparm manually but, a night later, all the errors I find in the system log seem to have successfully auto-corrected as they should normally. I encountered some funny errors in the meanwhile, like distorted sound \u00e0 la techno music and grep freaking out, but a yum update may have sufficed to repair them. The last smartctl -a \/dev\/sda completed without errors; I now have &#8220;ATA Error Count: 5004&#8221;, 2 for 197 Current_Pending_Sector and 198 Offline_Uncorrectable.<\/p>\n<p><strong>Update 3<\/strong>: the system is mostly usable, but the problems persist: &#8220;ATA Error Count: 9484&#8221;. I sometimes have to use the hdparm trick, but I think it&#8217;s not working properly because the problem later appears on the following sector. Offline_Uncorrectable is not growing, so I suspect the disk is failing to deactivate bad sectors. I guess I have to give up and buy a new one&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Since a couple days, my Seagate Momentus 7200.4 has been failing more and more, possibly because of a power outage. After the &#8220;WARNING: Your hard drive is failing&#8221; (I&#8217;m using fedora), the main symptom was the slowness: constant 100 % CPU wait for hours, almost impossible to do anything. I made a backup, then I [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-7463","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7463","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=7463"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/7463\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=7463"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=7463"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=7463"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}