Facebook
From Botched Wolf, 1 Year ago, written in Plain Text.
Embed
Download Paste or View Raw
Hits: 101
  1.  
  2.        
  3. mgdlbp 4 hours ago | next [–]
  4.  
  5. Apparently a few months ago it became known on the Chinese internet that the 980 Pro, 970 Evo Plus with new controller, and OEM versions are prone to getting unreadable sectors, where SMART 'Media and Data Integrity Errors' increases on every read attempt.
  6. https://www.reddit.com/r/buildapc/comments/x82mwe/samsung_ss... https://www.reddit.com/r/DataHoarder/comments/x8arle/psa_sam...
  7.  
  8. How I came across this: Ran into this last week(!) on a 6-month old drive -- but I'm not in China....hmm. Not just one bad batch? Interestingly, it's non deterministic - the data is backed up but trying ddrescue, it occasionally succeeds at reading a few kilobytes from the 5 MB of several runs of 512-16384 bytes that can't be read or written. Curious to see what happens with a firmware update and secure erase.
  9.  
  10. reply
  11.  
  12.        
  13. jamhan 3 hours ago | parent | next [–]
  14.  
  15. My anecdata:
  16. tl;dr: All 3 of my Samsung M.2 NVMe SSDs have failed in less than 3 years. 100% failure rate.
  17.  
  18. My first SSD was a 1TB Samsung 970 EVO. It failed after 2 years and 8 months. It was replaced under warranty with a 1TB 970 EVO Plus.
  19.  
  20. That replacement has now also failed after 1 year and 9 months.
  21.  
  22. I bought a 2nd 1TB 970 EVO Plus in May 2019. It has now also failed (2 years and 7 months).
  23.  
  24. Both are expected to be replaced under warranty.
  25.  
  26. The 2 970 EVO Plus SSDs clearly had hardware errors (that were not accurately reflected in SMART data) that caused everything from system hangs, game crashes to file corruption on OTHER drives. I couldn't believe it at first but after 5 days of testing and trial and error, I had it confirmed. As soon as I removed those SSDs, my PC was completely stable again.
  27.  
  28. In the meantime, I have bought a Kingston KC3000 1TB drive as I no longer trust Samsung M.2 NVMe SSDs. On the other hand, I have a Samsung EVO 850 SATA drive which has been rock-solid.
  29.  
  30. reply
  31.  
  32.        
  33. DriverDaily 2 hours ago | root | parent | next [–]
  34.  
  35. My anecdata, I have been running 4x 500GB Samsung 850 EVOs in Raid 0 continuously without failures since early 2015.
  36. reply
  37.  
  38.        
  39. mynameisvlad 1 hour ago | root | parent | next [–]
  40.  
  41. The article mentions issues with the 900-series drives. It seems like the 800-series are still rock solid (also been running them for s few years now without issue)
  42. reply
  43.  
  44.        
  45. Godel_unicode 1 hour ago | root | parent | prev | next [–]
  46.  
  47. Similarly my M.2 NVMe 950 pro has been in an always on machine that gets a ton of use since 2016.
  48. reply
  49.  
  50.        
  51. AuthorizedCust 2 hours ago | root | parent | prev | next [–]
  52.  
  53. The parent posts mentioned 970 and 980, not 850.
  54. reply
  55.  
  56.        
  57. metadat 3 hours ago | root | parent | prev | next [–]
  58.  
  59. I've bought 6-8 m.2 Samsung 970 EVO Plus and 980s since 2018, and none have failed to date.
  60. Anecdata is the worst, I'm sorry to hear about this happening to you. It's surely frustrating and upsetting.
  61.  
  62. reply
  63.  
  64.        
  65. post-it 3 hours ago | root | parent | prev | next [–]
  66.  
  67. Is it possible that your motherboard or PSU is killing the drives?
  68. Could also just be sheer chance, of course.
  69.  
  70. reply
  71.  
  72.        
  73. Macha 1 hour ago | root | parent | prev | next [–]
  74.  
  75. My anecdata, I have a 840 Pro, 850, 850 EVO, 970 and 980 Pro, all still running for years
  76. reply
  77.  
  78.        
  79. dinvlad 2 hours ago | root | parent | prev | next [–]
  80.  
  81. Worth checking if you have any thermal issues with it. Mine failed in a similar way due to presumably a rookie mistake of forgetting to remove the thermal pad tape on the mobo.
  82. reply
  83.  
  84.        
  85. jeffbee 1 hour ago | root | parent | next [–]
  86.  
  87. It's not likely that thermal issues would cause bad reliability on these things. At worst you could expect intermittently bad performance. You can check for this condition with `nvme smart-log`. If your device was often overheated, it would have "critical composite temperature time" non-zero. My Samsung that has been in service for years and has no thermal solution has a value of 1 minute and I happen to know that is because I heated it with a hair dryer to find out what would happen if it crossed the critical temperature.
  88. reply
  89.  
  90.        
  91. dinvlad 1 hour ago | root | parent | next [–]
  92.  
  93. Ha, interesting! Makes sense, the drive is supposed to just throttle itself before it can reach unsafe temps. I’ll def try to check, didn’t know the drive recorded that - thanks for the tip. In any case, now I know RMA is in order
  94. reply
  95.  
  96.        
  97. voidfunc 4 hours ago | parent | prev | next [–]
  98.  
  99. Hmm I'm going to need to check my Samsung ssd from oct 2021 that failed the first week of Jan 2023. I had started noticing some quirks in spring 2022 but it wasn't a super important drive so I ignored it.
  100. reply
  101.  
  102.        
  103. Dnguyen 3 hours ago | root | parent | next [–]
  104.  
  105. I have similar issue. It started failing mid last year. Then it got more and more frequent toward the end of the year. Last month I got tired of reinstalling OS for the 4th time and got a new system.
  106. reply
  107.  
  108.        
  109. sam0x17 1 hour ago | parent | prev | next [–]
  110.  
  111. My 980 pro failed witihn two months of purchasing it in late 2022
  112. reply
  113.  
  114.        
  115. dheera 3 hours ago | parent | prev | next [–]
  116.  
  117. I wonder if Qvo are still subject to the same issues.
  118. reply
  119.  
  120.        
  121. mmis1000 3 minutes ago | prev | next [–]
  122.  
  123. There is a chinese youtube doing SSD durability tests on a few SSDs of different vendor. And one of the tested ssd is Samsung 980 prop.
  124. What is the funny thing? The Samsung 980 died before the wear test even start.
  125.  
  126. https://youtu.be/tXYQZHz7u3w?t=898
  127.  
  128. reply
  129.  
  130.        
  131. justinclift 3 hours ago | prev | next [–]
  132.  
  133. As a data point, the Linux kernel has a long list of workarounds for "ata" related devices (SSDs, HDDs, etc):
  134. https://github.com/torvalds/linux/blob/69f2c9346313ba3d3dfa4...
  135.  
  136. Can be a bit eye opening to look down that and see equipment you're using listed. ;)
  137.  
  138. reply
  139.  
  140.        
  141. userbinator 2 hours ago | prev | next [–]
  142.  
  143. For the same price, you can get twice the space for 1/4 the endurance, thrice the space for 1/8th the endurance, and now four times the space for 1/16th the endurance. Most people don't realise that is a horrible tradeoff, because NAND flash marketing and terminology like "TLC" or "QLC" is intentionally deceptive and manufacturers have been very secretive about the true endurance specifications, as well as trying to overprice SLC out of production. If more people knew the truth of what they were trying to do, we wouldn't be in this situation.
  144. reply
  145.  
  146.        
  147. wahern 1 hour ago | parent | next [–]
  148.  
  149. > as well as trying to overprice SLC out of production.
  150. Is it even possible to buy SLC drives any more? For the past 5+ years the only outlet I've been able to find that even advertise SLC is https://www.delkin.com/, and you need to speak to sales to even get a price. I just assumed they and any other similar suppliers bought giant lots of chips at the tail end of SLC production and jack up the price on every new order as their supply dwindles. Or maybe they cobble together drives from the tiny SLC chips used for cache on modern SSDs?
  151.  
  152. reply
  153.  
  154.        
  155. userbinator 15 minutes ago | root | parent | next [–]
  156.  
  157. Is it even possible to buy SLC drives any more?
  158. Yes, small ones for industrial use. They're extremely expensive, however.
  159.  
  160. Looking at the raw NAND flash prices, SLC seems to still be around $4.60USD/GB, or roughly the same as it was over a decade ago, while MLC is already <$1USD/GB despite only a doubling in capacity. TLC and QLC seems to be down in the $0.10USD/GB. You can still buy raw SLC NAND flash in the smaller capacities of few GBs; this one is only 512MB, at the price mentioned above:
  161.  
  162. https://www.newark.com/micron/mt29f4g08abaeawp-it-e/flash-me...
  163.  
  164. If the pricing was sane, SLC drives would be only 4x more expensive as QLC ones for the same capacity, but that's not what we're seeing today.
  165.  
  166. reply
  167.  
  168.        
  169. wmf 1 hour ago | parent | prev | next [–]
  170.  
  171. While NAND endurance has certainly gone down, FTLs got much better during the same time so that SSD endurance is still fine for most people. And if the stock endurance isn't enough, a little overprovisioning is probably better than dropping back to very expensive MLC.
  172. reply
  173.  
  174.        
  175. userbinator 3 minutes ago | root | parent | next [–]
  176.  
  177. If the number of firmware bugs in SSDs we've seen over time is any indication, I don't think things are really getting better...
  178. SLC needs almost no FTL. 100K endurance. Very low raw error rate that can be handled with basic ECC.
  179.  
  180. reply
  181.  
  182.        
  183. emodendroket 1 hour ago | parent | prev | next [–]
  184.  
  185. Is it a horrible tradeoff though? I can think of many situations where that would be a somewhat compelling alternative to a spinning platter.
  186. reply
  187.  
  188.        
  189. HyperSane 33 minutes ago | parent | prev | next [–]
  190.  
  191. It isn't a bad tradeoff for most read workloads.
  192. reply
  193.  
  194.        
  195. newZWhoDis 1 hour ago | parent | prev | next [–]
  196.  
  197. Can you explain your point further? Are you talking about competitors?
  198. reply
  199.  
  200.        
  201. geocrasher 4 hours ago | prev | next [–]
  202.  
  203. Funny thing. This article prompted me to check the health of my two Samsung SSD's (a 250GB 850 EVO SATA III, and a 970 EVO Plus 1TB NVMe), which were fine.
  204. But Samsung's Magician also listed my Seagate ST2000DM008-2FR102 2TB spinny disk. It found a SMART error. I ran a performance test and looked at SMART again, and the "Hardware ECC Recovered" value went from 80 to 81, with a threshold of 64. My other software labels this as "good". Nevertheless, this drive is now being replaced by a 4TB WD Blue. Thanks, article. Saved me some future troubles!
  205.  
  206. reply
  207.  
  208.        
  209. megous 3 hours ago | parent | next [–]
  210.  
  211. The value rising from 80 to 81 is an improvement. The calculated value decreases when the raw value of "Hardware ECC Recovered" worsens.
  212. reply
  213.  
  214.        
  215. stavros 1 hour ago | root | parent | next [–]
  216.  
  217. Is this the case for all SMART values? Higher=better?
  218. reply
  219.  
  220.        
  221. arprocter 3 hours ago | parent | prev | next [–]
  222.  
  223. You might want to double-check that SMART status in CrystalDiskInfo
  224. I don't trust Magician to report correctly on other vendor's storage
  225.  
  226. reply
  227.  
  228.        
  229. opencl 3 hours ago | parent | prev | next [–]
  230.  
  231. Hardware ECC Recovered represents the amount of time between error correction events, so a higher number is better.
  232. reply
  233.  
  234.        
  235. kiririn 2 hours ago | parent | prev | next [–]
  236.  
  237. Completely normal
  238. reply
  239.  
  240.        
  241. dheera 3 hours ago | parent | prev | next [–]
  242.  
  243. How do I check health of a Samsung SSD on Linux?
  244. reply
  245.  
  246.        
  247. javaunsafe2019 3 hours ago | root | parent | next [–]
  248.  
  249. LMGTFY sudo smartctl -t long -a /dev/sdX
  250. reply
  251.  
  252.        
  253. jeffbee 1 hour ago | root | parent | next [–]
  254.  
  255. That's not the way for a modern SSD. Try `sudo nvme smart-log /dev/nvmeN`
  256. reply
  257.  
  258.        
  259. Izkata 22 minutes ago | root | parent | next [–]
  260.  
  261. Seems to give the same output as the second section "smartctl --all" gives... so, less information.
  262. Aside, any idea why it thinks my drive is 208% used?
  263.  
  264. reply
  265.  
  266.        
  267. jeffbee 15 minutes ago | root | parent | next [–]
  268.  
  269. No idea on that one. Mine are all three indicating 0%, but I've seen wacky stuff from SMART indicators over the years.
  270. reply
  271.  
  272.        
  273. smiley1437 5 hours ago | prev | next [–]
  274.  
  275. I've been trying to find a decent endurance NVME in the m.2 form factor for write-heavy applications and it appears that true 2-bit MLC has all but disappeared, replaced by 3-bit TLC and higher (with commensurate loss of endurance)
  276. The high endurance SSDs appear to be only available in u.2\u.3\hhhl and god-help-me EDSFF form factors
  277.  
  278. Any suggestions? Micron's 7450 isn't readily available
  279.  
  280. reply
  281.  
  282.        
  283. EarthLaunch 4 hours ago | parent | next [–]
  284.  
  285. They don't really make them anymore, but you can still get m.2 form factor Intel Optane SSDs in the 900/905P series, example[0]. They have insane endurance specs. Their performance is also still awesome, especially for random reads/writes[1]. I wish they had continued making them. Most PC builders just bought crappy Samsung SSDs this whole time, ignoring these awesome (and high priced) drives.
  286. > Life Expectancy 1.6 million hours Mean Time Between Failures (MTBF)
  287.  
  288. > Lifetime Endurance4 10 Drive Writes per Day (DWPD)
  289.  
  290. There was one update, but I don't believe it's m.2? [2]
  291.  
  292. 0: https://www.newegg.com/intel-optane-ssd-905p-series-380gb/p/...
  293.  
  294. 1: https://ssd.userbenchmark.com/ (sort by "Avg Bench" and you'll see these old Optanes still in the top 10)
  295.  
  296. 2: https://www.intel.com/content/www/us/en/products/docs/memory...
  297.  
  298. reply
  299.  
  300.        
  301. nine_k 3 hours ago | root | parent | next [–]
  302.  
  303. Most PCs experience very little write load; I can imagine that many of them experience less than one full drive write per lifetime.
  304. A database server box, or even a CI build box, is a whole different business.
  305.  
  306. reply
  307.  
  308.        
  309. HyperSane 20 minutes ago | parent | prev | next [–]
  310.  
  311. Use larger drives and use RAID to distribute writes over more drives. Accept that a TLC drive heavily used for writes is a consumable and act accordingly.
  312. reply
  313.  
  314.        
  315. OGWhales 4 hours ago | parent | prev | next [–]
  316.  
  317. You may also want to ask over on reddit in /r/newMaxx. Good place for SSD info and there is a pinned post for asking questions like this
  318. reply
  319.  
  320.        
  321. walterbell 3 hours ago | root | parent | next [–]
  322.  
  323. NewMaxx SSD references: http://ssd.borecraft.com/
  324. reply
  325.  
  326.        
  327. zorgmonkey 3 hours ago | parent | prev | next [–]
  328.  
  329. Why not get a U.2 drive and an adapter like this one https://www.startech.com/en-us/hdd/m2e4sff8643
  330. reply
  331.  
  332.        
  333. walterbell 3 hours ago | parent | prev | next [–]
  334.  
  335. WD RED SSDs are targeted at NAS use cases and claim endurance of 1PBW per TB, https://www.tomshardware.com/reviews/wd-red-sn700-review
  336. reply
  337.  
  338.        
  339. adgjlsfhk1 4 hours ago | parent | prev | next [–]
  340.  
  341. While it's true that MLC is mostly dead, you might want to consider a higher capacity TLC ssd. If you double your capacity, you double the endurance since SSD endurance is in drive writes per day, and a bigger SSD will likely have a bigger SLC cache to help with the write speed.
  342. reply
  343.  
  344.        
  345. spyder 49 minutes ago | root | parent | next [–]
  346.  
  347. Yep, you can even adjust the overprovisioning manually (at least for Samsung). So if you need more endurance (and improved random write performance) just buy bigger capacity and increase the over provisioning allocation. Found this good summary about it with graphs showing the impact of it:
  348. https://www.atpinc.com/blog/over-provisioning-ssd-benefits-e...
  349.  
  350. reply
  351.  
  352.        
  353. dinvlad 2 hours ago | root | parent | prev | next [–]
  354.  
  355. Seems weird that only 2TBs fail then
  356. reply
  357.  
  358.        
  359. wmf 1 hour ago | root | parent | next [–]
  360.  
  361. This sounds like a firmware bug that has nothing to do with endurance.
  362. reply
  363.  
  364.        
  365. rektide 4 hours ago | root | parent | prev | next [–]
  366.  
  367. 2 -> 3 bit cells is a 1.5x capacity bump.
  368. i would not be shocked to find tlc has a >1.5x impact on dwpd.
  369.  
  370. reply
  371.  
  372.        
  373. aidenn0 4 hours ago | root | parent | next [–]
  374.  
  375. But it might be easier to find e.g. a 3x sized TLC compared to a 1x size MLC in the FF that GP wants.
  376. reply
  377.  
  378.        
  379. adgjlsfhk1 4 hours ago | root | parent | next [–]
  380.  
  381. For a simple example, the 970 pro (MLC) had a 1200 drive write warrenty, while the 980 pro (TLC) only has a 600 drive write warranty, but the 2TB 980 pro is cheaper than the 1 TB 970 pro so you can get the same endurance for less.
  382. reply
  383.  
  384.        
  385. mnadkvlb 5 hours ago | parent | prev | next [–]
  386.  
  387. i recommend samsung pm9a3 versions. not as popular, but are enterprise products and also the endurance is like 3 times 980 pro i believe (please check it, not 100% sure).
  388. Been using in my threadrupper workstation with a lot of vms which are put to sleep every day with around .25tb written and read each time the vms are started. keep in mind these are 22110 form factor
  389.  
  390. reply
  391.  
  392.        
  393. jeffbee 4 hours ago | root | parent | next [–]
  394.  
  395. These are not available with retail support so if you manage to acquire one (which may have been pulled from service with an unknown level of previous wear) you will get ZERO support from Samsung no matter what goes wrong.
  396. reply
  397.  
  398.        
  399. mnadkvlb 4 hours ago | root | parent | next [–]
  400.  
  401. well i bought from digitec here in switzerland and they take care of warranty. other than that i don't expect any support from any ssd vendor.
  402. I rely on the physical store i buy from where i live. that's the reason i only buy either physically or from amazon germnay (their support had been rock solid in the last 10 years i had been using them).
  403.  
  404. reply
  405.  
  406.        
  407. Aardwolf 5 hours ago | prev | next [–]
  408.  
  409. I have this exact model, 980 Pro 2TB.
  410. It says to update firmware, but how can you do that from Linux? The instructions are all about some Windows program. Thanks!
  411.  
  412. EDIT: I'm quite happy with the warning from this article, fixed a potential future problem!
  413.  
  414. reply
  415.  
  416.        
  417. ggreer 3 hours ago | parent | next [–]
  418.  
  419. If you're on linux, you probably want to use fwupd[1]. You can check the existing version of your drive's firmware by running `fwupdmgr get-devices`. The version with the fix is 5B2QGXA7.
  420. I'm on Arch and apparently I installed the update at some point in the past.
  421.  
  422. 1. https://wiki.archlinux.org/title/fwupd
  423.  
  424. reply
  425.  
  426.        
  427. aendruk 2 hours ago | root | parent | next [–]
  428.  
  429. Samsung is publishing some firmware but not this. (https://fwupd.org/lvfs/vendors/#samsung, https://github.com/fwupd/fwupd/issues/5477)
  430.   $ fwupdmgr get-updates
  431.   Devices with no available firmware updates:
  432.    • SSD 980 PRO 2TB  
  433. It would be good to put some pressure on Samsung to use the Linux Vendor Firmware Service. I just opened a support ticket about it.
  434. fwupd is at least manually adding a warning about the affected firmware. https://github.com/fwupd/fwupd/pull/5481
  435.  
  436. reply
  437.  
  438.        
  439. pentamassiv 5 hours ago | parent | prev | next [–]
  440.  
  441. A few months ago I already updated my Samsung SSD by following this procedure: https://askubuntu.com/a/1386451. Theoretically they provide an image to boot from to do the update, but the image seems very outdated and did not recognize my keyboard so it was unusable.
  442. reply
  443.  
  444.        
  445. Aardwolf 4 hours ago | root | parent | next [–]
  446.  
  447. I now found and followed this:
  448. https://blog.quindorian.org/2021/05/firmware-update-samsung-...
  449.  
  450. And it seems to have worked. After extracting this updater tool and running it, smartctl kept showing the old firmware version (3B2QGXA7), but after reboot it now shows the new version (5B2QGXA7).
  451.  
  452. I took the risk of running this while the OS (Archlinux) was running with the disk mounted (this is the OS install disk), and at first sight this didn't cause issues. But still do it at your own risk!!
  453.  
  454. reply
  455.  
  456.        
  457. YoumuChan 33 minutes ago | root | parent | next [–]
  458.  
  459. The command can be simplified as
  460.   isoinfo -R -i xxx.iso -x /initrd | gzip -dc | cpio -idv --no-absolute-filenames "root/fumagician*"
  461. if you don't want to go through the mounting and extracting everything.
  462. reply
  463.  
  464.        
  465. FullyFunctional 3 hours ago | root | parent | prev | next [–]
  466.  
  467. Thanks, but that takes me to old firmware. I was however able to download the new firmware from https://semiconductor.samsung.com/consumer-storage/support/t... and use the same procedure and it worked:
  468.     ├─SSD 980 PRO 2TB:
  469.     │     Device ID:          03281da317dccd2b18de2bd1cc70a782df40ed7e
  470.     │     Summary:            NVM Express solid state drive
  471.     │     Current version:    5B2QGXA7
  472.  
  473. My home is on non-redundant stripe of two 980 Pro which both had the bad firmware, so I was obviously motivated, but not panicked as it's replicated hourly to spinning rust (and I have offsite backups). I treat Flash memory as dynamic ram with only slightly better retention.
  474. reply
  475.  
  476.        
  477. aendruk 51 minutes ago | root | parent | prev | next [–]
  478.  
  479. Also seems to have worked for me doing 3B2QGXA7→5B2QGXA7 in NixOS. Extract ISO, extract initrd, run fumagician, reboot.
  480. reply
  481.  
  482.        
  483. qwertox 4 hours ago | root | parent | prev | next [–]
  484.  
  485. I think I managed to upgrade the firmware from within Kubuntu (it wasn't the OS Nvme) by using this method
  486. https://blog.quindorian.org/2021/05/firmware-update-samsung-...
  487.  
  488. I've done it some months ago, so I don't remember if it was that one exactly.
  489.  
  490. I'm now relieved to know that 5B2QGXA7 is still the current one.
  491.  
  492. reply
  493.  
  494.        
  495. rogers18445 3 hours ago | parent | prev | next [–]
  496.  
  497. an example:
  498.     nvme fw-log /dev/nvme0
  499.     nvme id-ctrl /dev/nvme0 -H | grep Firmware
  500.     nvme fw-download -f firmware.ebin /dev/nvme0
  501.     nvme fw-commit /dev/nvme0 -s 2 -a 3
  502.     nvme fw-log /dev/nvme0
  503.  
  504. In an unlikely event, may need to change the slot (-s)
  505. reply
  506.  
  507.        
  508. acidburnNSA 5 hours ago | parent | prev | next [–]
  509.  
  510. Following from my linux desktop. I lost a system SSD that was a 980 2 TB and recently reinstalled everything, thinking it was a fluke. Now worried it will happen again rapidly.
  511. reply
  512.  
  513.        
  514. TacticalCoder 5 hours ago | parent | prev | next [–]
  515.  
  516. From TFA it's not just the 980 Pro 2 TB but also all the newer 990, so it's problematic.
  517. reply
  518.  
  519.        
  520. Aardwolf 4 hours ago | root | parent | next [–]
  521.  
  522. The article switches from the 980 issue to the 990 issue in a bit of an unclear way, but I think they're independent problems and the firmware update should fix the 980 one?
  523. reply
  524.  
  525.        
  526. pja 4 hours ago | parent | prev | next [–]
  527.  
  528. nvme drives can be updated from the command line. I've done it myself.
  529. Extracting the actual firmware update from the files Samsung gives you might be an issue though.
  530.  
  531. reply
  532.  
  533.        
  534. amluto 4 hours ago | root | parent | next [–]
  535.  
  536. nvme-cli can, in principle, upload firmware. I’ve never personally tried it.
  537. reply
  538.  
  539.        
  540. babypuncher 4 hours ago | parent | prev | next [–]
  541.  
  542. I just put the same drive in my PS5 in December. None of my Windows machines have spare NVMe slots, so updating this should be interesting
  543. reply
  544.  
  545.        
  546. 867-5309 4 hours ago | root | parent | next [–]
  547.  
  548. boot from Windows on a SATA drive?
  549. reply
  550.  
  551.        
  552. ssdpain 5 hours ago | prev | next [–]
  553.  
  554. I installed 2 x 980 Pro 2Tb in a laptop in Nov 2022. Running a daily Robocopy bat script to backup a folder in C: to D: would freeze a couple of times a week and lock the D: drive. After reboot, a drive check would find no errors and everything would work as normal. I've used the same script for years with no issues.
  555. Since the firmware update last week Robocopy has not frozen the drive at all this week.
  556.  
  557. reply
  558.  
  559.        
  560. zerocrates 4 hours ago | parent | next [–]
  561.  
  562. The freeze/reboot/fine cycle seems to be a common one for SSDs acting poorly, running out of blocks they want to use internally, or memory or cache or something, or just hanging in their own firmware for whatever other reason.
  563. One of my earlier forays into switching to SSDs, I installed Intel... I think it was 525, 535, something like that, 2.5 inch SATA drives in several different machines. Every one has failed by now with this similar mode of (in)operation. On my desktop where I had one, it would simply bluescreen, but then come back fine until eventually reading certain parts of the disk would just always cause it to hang and it had to be replaced. Failed SSDs like this are interesting because Windows (and to a lesser extent Linux) really aren't prepared for the disk to just hang, so trying to recover anything off them can be a challenge.
  564.  
  565. Just recently I found out the last one I had around, in a little headless desktop server, was the cause of my problems with it where it would partially hang after a couple days of uptime. Having finally gotten around to having it hooked up to a display, I was treated to a sea of red dmesg errors from the disk.
  566.  
  567. I think ultimately part of the problem was new power-saving features Intel had tried to add for these disks, which would cause them to write to themselves a large amount and just eat through their useful lifetime much faster than you'd assume.
  568.  
  569. In almost every case, I replaced these with, of course... Samsungs. Though I believe I've been lucky enough not to choose any of their bad ones.
  570.  
  571. reply
  572.  
  573.        
  574. wtallis 4 hours ago | root | parent | next [–]
  575.  
  576. > The freeze/reboot/fine cycle seems to be a common one for SSDs acting poorly, running out of blocks they want to use internally, or memory or cache or something, or just hanging in their own firmware for whatever other reason.
  577. Given that it's two gen4 drives in a laptop being subjected to a moderately heavy sustained workload, I'd also suspect a thermal problem or maybe even power delivery. Those two slots are probably being fed off the same 3.3V regulator.
  578.  
  579. Since the firmware bug appears to have caused catastrophic write amplification, what may seem to the user to be only a modest and reasonable workload may be causing the drive that is the backup destination to be running at full tilt doing a ton of writes to the flash and causing the drive to hit its peak power consumption and heat output.
  580.  
  581. reply
  582.  
  583.        
  584. verall 1 hour ago | root | parent | prev | next [–]
  585.  
  586. Yeah I had an old intel ssd that would hang like this and I could never figure out wtf was going on
  587. reply
  588.  
  589.        
  590. issafram 5 hours ago | parent | prev | next [–]
  591.  
  592. Could you provide that batch script please? Like in a GitHub Gist or something similar.
  593. reply
  594.  
  595.        
  596. ssdpain 5 hours ago | root | parent | next [–]
  597.  
  598. Sure...
  599. @echo off
  600.  
  601. pause
  602.  
  603. robocopy "C:\Users\o\Desktop\2023" "D:\2023" /e /mir /np /v /tee /r:0 /w:0 /log+:"C:\Users\o\Desktop\log_robocopy.txt"
  604.  
  605. pause
  606.  
  607. @echo on
  608.  
  609. reply
  610.  
  611.        
  612. skissane 4 hours ago | root | parent | next [–]
  613.  
  614. You don’t need to turn echo back on at the end of your batch file. That line is pointless.
  615. reply
  616.  
  617.        
  618. thrdbndndn 38 minutes ago | root | parent | next [–]
  619.  
  620. And don't need /e since /mir=/e /purge
  621. reply
  622.  
  623.        
  624. dmitrygr 5 hours ago | parent | prev | next [–]
  625.  
  626. > the firmware update last week
  627. Link to specific firmware version please?
  628.  
  629. reply
  630.  
  631.        
  632. ssdpain 5 hours ago | root | parent | next [–]
  633.  
  634. The new firmware is version 5B2QGXA7, updated via magician on Windows. I didn't make a note of earlier firmware versions. It's still too soon to know if the ssd freeze will reoccur.
  635. reply
  636.  
  637.        
  638. dmitrygr 4 hours ago | root | parent | next [–]
  639.  
  640. thank you
  641. reply
  642.  
  643.        
  644. walrus01 4 hours ago | parent | prev | next [–]
  645.  
  646. what does the SMART data for your drive say?
  647. I'm morbidly curious how much it reports lifespan remaining for its internal write-wear-leveling system.
  648.  
  649. reply
  650.  
  651.        
  652. evil-olive 3 hours ago | prev | next [–]
  653.  
  654. every machine I have that can fit 2 SSDs (basically, everything except the very slim laptops) I have converted over to running a ZFS mirror as its root filesystem. NixOS makes this very easy to do because the grub.mirroredBoots option [0] removes the need for a separate "bootpool" with limited ZFS feature flags.
  655. and crucially, I always make sure they're 2 drives from different manufacturers, so that a bug of this nature should never be able to take down both drives in a pool simultaneously.
  656.  
  657. I think of this as the "if you're going to go to the trouble of wearing a belt and suspenders, make sure to buy them from separate brands" principle.
  658.  
  659. 0: https://search.nixos.org/options?channel=22.11&show=boot.loa...
  660.  
  661. reply
  662.  
  663.        
  664. gregatragenet3 3 hours ago | prev | next [–]
  665.  
  666. This posting came two months late for me.
  667. My 980 2TB crossed the river styx over the holiday break. Failure mode exactly as described. Nice Christmas present for me. Took 3 weeks to get the warranty replacement from Samsung.
  668.  
  669. reply
  670.  
  671.        
  672. aporetics 4 hours ago | prev | next [–]
  673.  
  674. Just a note as a happy customer of Puget Systems that my experience working with them has been excellent, they really seem to be expert in their field, and have been for many years.
  675. Also, their submerged in mineral oil aquarium computer was really cool, back in the day.
  676.  
  677. reply
  678.  
  679.        
  680. danielodievich 3 hours ago | parent | next [–]
  681.  
  682. I confirm. 2 Puget System workstations in this house. Just opened both of them up to add more hard drives for games and games work storage, and the cabling is so lovely. In my new gaming box I have a Samsung 980 Pro 1TB which according to this note is unaffected, and I couldn't find it on my motherboard. I created a support ticket and the support immediately responded with very clear explanation (it's under its own heatsink under a humongous heatsink/fan for the CPU. Duh!
  683. reply
  684.  
  685.        
  686. craigching 2 hours ago | parent | prev | next [–]
  687.  
  688. I use my Puget for build performance for a large mono-repo at Adobe. I could set my watch to the consistency in memory and CPU usage during a full build of our products.
  689. reply
  690.  
  691.        
  692. adrenvi 5 hours ago | prev | next [–]
  693.  
  694. Samsung 870 EVO drives were also known to fail early including my 2TB model.
  695. https://www.techpowerup.com/forums/threads/samsung-870-evo-b...
  696.  
  697. reply
  698.  
  699.        
  700. acabal 5 hours ago | parent | next [–]
  701.  
  702. Yes, this bit me just last month. Around October 2022 I purchased 3 Samsung 870 EVO 2TBs for use in a RAID array. By January 2023, all three of them failed within a week of each other!
  703. Fortunately they failed one by one, so I was barely was able to recover my RAID array by pulling out one drive at a time, powering the computer off, and waiting for the RMA replacement to arrive.
  704.  
  705. But imagine my shock to see one drive fail... only to replace it with an RMA... and then days later, seeing the next drive fail... and the next!
  706.  
  707. reply
  708.  
  709.        
  710. gjvc 4 hours ago | root | parent | next [–]
  711.  
  712. when equipping a storage system, it does make sense to use disks from different manufacturers, different models, and different vintages (manufacturing batches).
  713. equipping a storage system with disks all of the same make, model, and vintage is invoking the statistics gods to strike failure all at once (or close enough that you won't be able to keep up with the rate of failure and time to rebuild)
  714.  
  715. personal experience: attempting to rescue a failing 192 disk system containing disks all of the same make and model. wearisome.
  716.  
  717. reply
  718.  
  719.        
  720. bombcar 5 hours ago | root | parent | prev | next [–]
  721.  
  722. A perfect example why RAID ain’t backup
  723. reply
  724.  
  725.        
  726. acabal 5 hours ago | root | parent | next [–]
  727.  
  728. Indeed! Fortunately the RAID was also backed-up offsite. But, the entire process was shocking in several ways.
  729. At least Samsung was fairly speedy with the RMAs and it was basically no-questions-asked... because I imagine they're getting tons of these mailed back to them.
  730.  
  731. reply
  732.  
  733.        
  734. arprocter 3 hours ago | parent | prev | next [–]
  735.  
  736. Yep, my 2TB also got hit by this - did an RMA in November
  737. The replacement drive they sent me has behaved itself so far, touch wood
  738.  
  739. reply
  740.  
  741.        
  742. Dalewyn 5 hours ago | prev | next [–]
  743.  
  744. I wasn't aware the 980 Pro 2TB was also affected; I have four of those in a new machine I put together last year.
  745. Time to install some bloatware and see about updating their firmwares, I guess...
  746.  
  747. reply
  748.  
  749.        
  750. taspeotis 1 hour ago | prev | next [–]
  751.  
  752. We have Lenovo laptops at work with M.2’s that are OEM-branded Samsungs.
  753. One bricked itself in to read only mode after a few months.
  754.  
  755. The other has been losing 1% health each week or so. I caught it losing 2% in just two days recently.
  756.  
  757. These drives are older than the 990 model mentioned in the article but I have my suspicions anyway they’re dud drives.
  758.  
  759. Nothing lost except time - they can be swapped under warranty. But I used to buy Intel exclusively before swapping to Samsung when Intel started selling rebranded drives.
  760.  
  761. I guess the search for a reliable vendor starts again…
  762.  
  763. reply
  764.  
  765.        
  766. jeffbee 1 hour ago | parent | next [–]
  767.  
  768. These anecdotes are pretty frustrating without the other key piece of information. For the given lifetime indicators, how many writes were served? Are they wearing out faster than their TBW claims, or are they being written more than you expected?
  769. reply
  770.  
  771.        
  772. taspeotis 45 minutes ago | root | parent | next [–]
  773.  
  774. It’s at 86% with 12.21TB written. Total power on time 68 days. Drive temp sits around 45 degrees celsius.
  775. I’m not paying great attention to all the SMART counters day by day.
  776.  
  777. It’s for a dev workload so like … compiling code and stuff? I have the exact same workload on my desktop PC and its Samsung drive health is 99% after … years.
  778.  
  779. reply
  780.  
  781.        
  782. jeffbee 37 minutes ago | root | parent | next [–]
  783.  
  784. OK, so close to 1% lifetime per TBW, or lifetime approximately 100TBW. Thanks! That's consistent with their endurance claims for the smallest SSDs (128GB PM991 for example, or 256GB 960 Evo) but it would be poor for a larger one.
  785. reply
  786.  
  787.        
  788. taspeotis 18 minutes ago | root | parent | next [–]
  789.  
  790. Cool well these are 2TB drives so…
  791. reply
  792.  
  793.        
  794. blindriver 2 hours ago | prev | next [–]
  795.  
  796. Is there any way to update my SSD on Windows without downloading Samsung Magician? I have the Samsung Evo across multiple systems and my Linux ones are okay but my Windows machines aren't okay unfortunately, they have the bad firmware.
  797. reply
  798.  
  799.        
  800. bullen 3 hours ago | prev | next [–]
  801.  
  802. The real problem is OS:es that write to disk for no good reason.
  803. Windows 10 writes 100KB/s constantly.
  804.  
  805. That should be illegal.
  806.  
  807. reply
  808.  
  809.        
  810. jandrese 42 minutes ago | parent | next [–]
  811.  
  812. Try running Windows 10 off of an older spinning laptop drive. It can take upwards of 40 minutes to display the desktop on the first boot after Windows Update runs. Even in normal operation those constant low level writes leave barely any breathing room for your actual applications. Full size hard drives do a bit better, but even then it can be pretty painful when the drive indexing service kicks off or .NET is updated.
  813. reply
  814.  
  815.        
  816. donmcronald 28 minutes ago | root | parent | next [–]
  817.  
  818. Windows 10 on an SSD feels like Windows 7 on a spinning disk. Microsoft has wiped out the gains we got from SSDs.
  819. reply
  820.  
  821.        
  822. rogers18445 3 hours ago | parent | prev | next [–]
  823.  
  824. This doesn't actually matter in a practical sense. Assuming 24/7, it's 3TB a year. Which is ~1% drive endurance.
  825. Also, if you are worried about overwriting the same files over and over, it also doesn't matter. Block device addresses are not physical addresses, controller maps them to wear the drive evenly.
  826.  
  827. reply
  828.  
  829.        
  830. wtallis 2 hours ago | root | parent | next [–]
  831.  
  832. On the other hand, lots of tiny writes scattered all over will tend to produce much higher write amplification than large sequential writes. So you'll get more actual wear to the drive from the 3TB of constant background churn than if you copied in 3TB of movies.
  833. reply
  834.  
  835.        
  836. rogers18445 1 hour ago | root | parent | next [–]
  837.  
  838. Those writes would have to be significantly smaller than the SSD's page (sector) size which is 512 bytes or 4 KiB. And would have to be written to different pages in rapid succession (to be flushed apart) - a standard serial write wouldn't trigger this even if it's 1 byte at a time, the OS FS cache would buffer it.
  839. It would have to be very misbehaving software or deliberate sabotage.
  840.  
  841. reply
  842.  
  843.        
  844. vlovich123 1 hour ago | root | parent | next [–]
  845.  
  846. I’m pretty sure SSDs can only do 4kib aligned writes regardless of the FS sector size (under the hood it’s a write amplification unless the OS or controller manage to coalesce them. But yea, it depends on how things are getting flushed, but generally I wouldn’t expect too much magic unless you get lucky. It sounds like a small bug in the OS (ie these kinds of wires should be matched in memory in the application).
  847. reply
  848.  
  849.        
  850. donmcronald 25 minutes ago | root | parent | next [–]
  851.  
  852. I thought some of them even do 8KB. I’ve seen ZFS tips that claim you should use 8KB blocks on things like an 850 Pro.
  853. reply
  854.  
  855.        
  856. rogers18445 1 hour ago | root | parent | prev | next [–]
  857.  
  858. I do wonder if perhaps the good NVME SSD controllers come with magic. It would take a single instance of malware ruining SSD's with 4000x write amplification to taint some brands while aiding the marketing of others.
  859. reply
  860.  
  861.        
  862. matja 2 hours ago | root | parent | prev | next [–]
  863.  
  864. Except "evenly" is not a standard, or something that anyone other than the manufacturer can verify, it's hidden in the firmware so we have no idea really.
  865. reply
  866.  
  867.        
  868. walterbell 3 hours ago | parent | prev | next [–]
  869.  
  870. Do we know the location of those writes, perhaps they can be redirected to a ramdisk?
  871. reply
  872.  
  873.        
  874. rasz 2 minutes ago | root | parent | next [–]
  875.  
  876. management/windows logs about >100 active logs. Performance/data collector set about >50 active Event Trace Sessions running.
  877. reply
  878.  
  879.        
  880. paranoidrobot 3 hours ago | root | parent | prev | next [–]
  881.  
  882. Sysinternals tools will show up the writes and what is causing them.
  883. I've dug down and found random things doing dumb stuff in the past. Verbose logging turned on by default for some services, for example.
  884.  
  885. reply
  886.  
  887.        
  888. craigching 2 hours ago | prev | next [–]
  889.  
  890. I have a puget system with the Samsung SSDs mentioned and it locks up on me every 2-4 weeks. This sounds like it would explain the problem. Puget sent out a message this week to upgrade the Samsung firmware, but I am at the latest. I’ll be contacting Puget support on Monday so I’m on their radar.
  891. I will say I love my Puget, its performance has been killer other than these lockups. And I’ve heard only good things about Puget support. I should have reported this months ago, but it’s just now that I’m doing some critical work on Windows that it’s affecting me.
  892.  
  893. reply
  894.  
  895.        
  896. dataflow 3 hours ago | prev | next [–]
  897.  
  898. Tangent: what's the cheapest 4TB+ PCIe SSD without significant known issues but with hardware encryption? The SN850x seems to have its own issues, and beyond these everything else seems so expensive.
  899. reply
  900.  
  901.        
  902. nine_k 3 hours ago | parent | next [–]
  903.  
  904. Is hardware encryption a widespread feature?
  905. What are the benefits of it for you, compared to OS-level full disk encryption?
  906.  
  907. reply
  908.  
  909.        
  910. wnevets 2 hours ago | parent | prev | next [–]
  911.  
  912. > The SN850x seems to have its own issues
  913. Such as?
  914.  
  915. reply
  916.  
  917.        
  918. dataflow 2 hours ago | root | parent | next [–]
  919.  
  920. https://community.wd.com/t/sn850x-the-driver-detected-a-cont...
  921. reply
  922.  
  923.        
  924. emodendroket 1 hour ago | prev | next [–]
  925.  
  926. Wouldn't be surprised if this is another pandemic supply chain casualty.
  927. reply
  928.  
  929.        
  930. jandrese 46 minutes ago | parent | next [–]
  931.  
  932. Sounds more like crappy firmware to me. This is not the first SSD to suffer from a crippling frimware flaw that is impossible to fix.
  933. I lost almost an entire computer lab of Dells thanks to the goddamn Sandforce firmware. One that the company acknowledged but refused to lift a finger to fix. Luckily it is possible to fix these yourself despite the vendor hostility towards the repair. Look how easy it is: https://computerlounge.it/how-to-unbrick-sandforce-ssd/
  934.  
  935. reply
  936.  
  937.        
  938. Havoc 2 hours ago | prev | next [–]
  939.  
  940. > high failure rates in the field
  941. Jikes. A lot of people were hoping that the high wear is just a reporting artifact
  942.  
  943. reply
  944.  
  945.        
  946. xoa 5 hours ago | prev | next [–]
  947.  
  948. Ars had a piece covering this as well, and I do wonder if there is something going on somewhere else in the Samsung stack, not just the NVMe 900 series line. Pure anecdote, but two years ago I did a NAS for a client using 24x 2TB Samsung 870 Evo drives (they'd gotten some incentive deal for it). While it was all one type vs mixed, there was the "luxury" of time because at that point getting the system they wanted together had a significant lead time. So I did ensure that the drives were purchased over the course of around 7 months, from multiple different reputable sellers (B&H, CDW, Provantage etc) in separate batches. System was solid, an Epyc 2 based SuperMicro server, running TrueNAS.
  949. And then last year with around 5500-7500 quite light hours of runtime (primarily reads, ~0.08 DWPD, well under official rating of 0.3 DWPD) drives started failing. These were definitely real failures, first indication came from regular automated ZFS scrubs and reporting increasing checksum errors and ATA errors. It was for so many drives and I'd always considered Samsung SSDs relatively reliable (even for consumer ones) that at first I thought it was a SATA controller failure, and our rep agreed and warranties back the server. They were great, gold plated support contracts pay off once in a while, and motherboard replacement and thorough testing later back in service. More drive problems. SMART short tests said everything was healthy, first longs did too. But then drives exceeded error limits and started getting faulted, and at last SMART long tests started failing. Digging in showed worrisome stats. So began swapping out and warrantying drives (cheers to the stress test to TrueNAS, in the end zero downtime or need to restore from backups). In the end, THIRTEEN (13) out of 24 failed. Brutal >50% dead drive rate. I talked to some others around and they'd seen <1 year rates also at 30-60%. Big :\. Rep also indicated they were hearing more about Samsung failures.
  950.  
  951. Anyway, gave me a talking point going forward to really, really press management on "it's worth paying for drives from 3-4x brands and maybe splurging for higher rated vs consumer too", but also does made me wonder if there is something going on, or was (pandemic related?), at Samsung's storage division. It's definitely pure anecdote but still, I spread those drive purchases out reasonably hard, and they had radically different serial numbers. Same with other folks I know at other businesses using various Samsung drives, everyone has been going to real effort following decent practices to prevent buying drives all from a single lot. Even 10% failure rate for consumer drives I could have seen, but 54%? And not a bathtub curve all frontloaded in the first month or two but after 7-11 months? That feels high? Samsung did replace them all no questions asked, they paid for shipping too. I don't have any global insight into how this all looks and it could be just plain bad luck for all of us in the region, but still.
  952.  
  953. reply
  954.  
  955.        
  956. paulmd 3 hours ago | parent | next [–]
  957.  
  958. > Anyway, gave me a talking point going forward to really, really press management on "it's worth paying for drives from 3-4x brands and maybe splurging for higher rated vs consumer too", but also does made me wonder if there is something going on, or was (pandemic related?), at Samsung's storage division.
  959. What's going on at Samsung is the same thing that's always been going on at Samsung: they use their own flash and their own controllers and they have their own sets of problems as a result. It was a selling point in the early days of Sandforce and other turds (leading to things like the OCZ Vertex series) but now the commodity market has caught up and Samsung doing their own thing is kind of a negative. Like yeah it's fine as long as they don't screw up, but they're screwing up a lot more than they used to.
  960.  
  961. I don't see any direct correlation between the various failures that have occurred over the years. 840 Evo had something wrong with the flash NANDs that caused them to lose charge over time (leading to data loss) so they put out a new firmware that would just continuously write the flash in a tape-loop sort of deal (lol) to avoid the data ever aging out. I don't count that as a controller flaw, that's a flash flaw that was fixed with a firmware patch.
  962.  
  963. 870 Evo, 970 Evo, 970 Evo Plus, and 980 have all been accused of having problems over the years, in addition to the 980 Pro and now 990 Pro, but there's actually a pretty good variety of controller models as well as different flash types (from 64-layer to 236-layer) there. It's hard to know how much firmware they all share though... or whether it was more flash problems in certain batches, or what. But overall Samsung certainly has had a lot of failures in recent years and I think it all really comes back to the fact that they're using their own controllers and their own flash, while everyone else is pretty much commodity at this point... meaning they get their own bugs too.
  964.  
  965. https://docs.google.com/spreadsheets/d/1B27_j9NDPU3cNlj2HKcr...
  966.  
  967. reply
  968.  
  969.        
  970. walrus01 4 hours ago | prev | next [–]
  971.  
  972. I wish more independent review organizations would conduct destructive "write lifespan until ultimate failure" real world tests on SSDs. With a mixture of real world large contiguous files and small random writes.
  973. Real ultimate write lifespan on 3-level-cell and QLC consumer grade SSDs varies wildly for things of the same capacity and similar price.
  974.  
  975. Such as this series of tests from 7 years ago: https://techreport.com/review/27909/the-ssd-endurance-experi...
  976.  
  977. It looks like the bar charts and other data in that URL are now broken, which is sad, because I recall reading it when it was first published and it shows some amazing differences between the drives that died first, and the ones that died last.
  978.  
  979. another similar: https://www.guru3d.com/news-story/endurance-test-of-samsung-...
  980.  
  981. reply
  982.  
  983.        
  984. Havoc 2 hours ago | parent | next [–]
  985.  
  986. >destructive "write lifespan until ultimate failure" real world tests on SSDs
  987. >from 7 years ago
  988.  
  989. It's from 7 years back for good reason. They stopped doing those tests when it became impractical as endurance increased. The drives are now good enough that you can't wear them out fast enough to make sense in a review setting
  990.  
  991. ...unless fundamentally broken like these
  992.  
  993. reply
  994.  
  995.        
  996. walrus01 0 minutes ago | root | parent | next [–]
  997.  
  998. I find it highly improbable that you couldn't wear out a 3-level-cell or 4-level-cell consumer grade SSD which is capable of 300-500MB/s writes with a 24x7 automated test script in just a few months.
  999. Or at least leave it running for a couple of weeks and then see what the SMART-reported remaining write lifespan data reports it to be, versus the brand new out of box baseline.
  1000.  
  1001. reply
  1002.  
  1003.        
  1004. donmcronald 17 minutes ago | root | parent | prev | next [–]
  1005.  
  1006. The whole review industry just stopped scrutinizing SSDs several years ago, right around the time manufacturers started cutting features like power loss protection and DRAT/RZAT along with switching to TLC and QLC.
  1007. Funny how that worked out.
  1008.  
  1009. reply
  1010.  
  1011.        
  1012. rom-antics 3 hours ago | parent | prev | next [–]
  1013.  
  1014. I'd like to see this too!
  1015. The official line is that endurance should not matter for most people. For example the Samsung 990 Pro 4TB is rated for 2400TB TBW - which, if the drive has a service of 5 years, is 1.3TB of data written per day. The average user will need < 1% of that.
  1016.  
  1017. Where that falls down though of course is cases like this. The point of a review is to show when real-world performance doesn't match the marketing. Tech reviewers seem to be blindly trusting the marketing on this one. They're really dropping the ball.
  1018.  
  1019. reply
  1020.  
  1021.        
  1022. KennyBlanken 3 hours ago | parent | prev | next [–]
  1023.  
  1024. What's even more important than durability is what the drive does when it runs out of write cycles.
  1025. They should just become read-only, but it seems that in the vast majority, the controller just shuts off and bricks the drive.
  1026.  
  1027. reply
  1028.  
  1029.        
  1030. rekoil 4 hours ago | prev | next [–]
  1031.  
  1032. I've had 2x Samsung 980 Pro 2TB fail in as many years. Last time I buy Samsung.
  1033. reply
  1034.  
  1035.        
  1036. sushid 4 hours ago | parent | next [–]
  1037.  
  1038. My Samsung 980s (1TB so not affected per article) are still going strong but Samsung has been dropping the ball across the board for me in recent years. I've had a bad experience with their Samsung Frame last year, their fridges are a nightmare to fix, and I'm now hearing bad things about their washing machine and their SSDs. I know these departments are not related but it's not a good look.
  1039. reply
  1040.  
  1041.        
  1042. jeffbee 5 hours ago | prev | next [–]
  1043.  
  1044. Would like to know more. Were failures in the field from wear-out or sudden death? Are the health indicators losing 1% per week consistent with the datasheet TBW, or worse?
  1045. reply
  1046.  
  1047.        
  1048. KennyBlanken 3 hours ago | prev | next [–]
  1049.  
  1050. Ask anyone with a circa-2014-ish Macbook Pro about Samsung SSD reliability.
  1051. The samsung-made drives lasted about 5-6 years. Everything seemed fine, and then one day you'd get a spinning pizza of death, power down, power it back on...and your SSD was...completely gone. Doesn't even enumerate on the PCIe bus. It's just gone.
  1052.  
  1053. Screw the SSD chipset manufacturers for not making sure that their controllers can at least a)still show up on the bus b)be read-only in some sort of recovery mode.
  1054.  
  1055. reply
  1056.  
  1057.        
  1058. flyinglizard 5 hours ago | prev | next [–]
  1059.  
  1060. Had issues with a 2TB 980 Pro. Things have stabilized with recent updates.
  1061. reply
  1062.  
  1063.        
  1064. gjsman-1000 5 hours ago | prev | next [–]
  1065.  
  1066. There is always the possibility that their S.M.A.R.T. implementation is borked...
  1067. reply
  1068.  
  1069.        
  1070. TillE 5 hours ago | parent | next [–]
  1071.  
  1072. The article does say they've seen "abnormally high failure rates in the field", so it's not just that.
  1073. reply
  1074.  
  1075.        
  1076. jpk 5 hours ago | parent | prev | next [–]
  1077.  
  1078. If that's all it was, then it's likely a firmware update would not only prevent the issue, but also reverse it if the storage is actually healthy. That doesn't seem to be the case here, though.
  1079. reply
  1080.  
  1081.        
  1082. kmeisthax 4 hours ago | root | parent | next [–]
  1083.  
  1084. Keep in mind that one of the things SSD firmware does is deliberately write-lock the drive if the media is too worn to erase. So buggy firmware overestimating media wear is also likely to cause a failed drive.
  1085. reply
  1086.  
  1087.        
  1088. pifm_guy 5 hours ago | prev [–]
  1089.  
  1090. I really want to see ssd manufacturers offer a decent warranty...
  1091. This drive costs $100, and will last 10 years or until 100TB has been written to it, as long as you keep it within the specified temperature/humidity/power conditions.
  1092.  
  1093. If it fails to do that, we will return $1000 to you.
  1094.  
  1095. reply
  1096.  
  1097.        
  1098. mrtksn 5 hours ago | parent | next [–]
  1099.  
  1100. This sounds like an SLA agreement, its very unlikely you'll get that for 100 bucks. Even if this manufacturer somehow perfected their process and have zero defects, they are still acquiring a 10 years liability for 100 dollars of revenue.
  1101. reply
  1102.  
  1103.        
  1104. rlpb 4 hours ago | root | parent | next [–]
  1105.  
  1106. APC sell surge protectors with equipment protection insurance for less than $100. Apparently, it's possible even for products sold at $100.
  1107. reply
  1108.  
  1109.        
  1110. mrtksn 4 hours ago | root | parent | next [–]
  1111.  
  1112. Sure, should be possible to sell you an insurance.
  1113. reply
  1114.  
  1115.        
  1116. walterbell 5 hours ago | parent | prev | next [–]
  1117.  
  1118. In theory, a 3rd party insurance equivalent to AppleCare could be constructed for some technology products, but this is hampered by short product lifecycles, lack of BOM transparency (e.g components changed within a single product generation) and ability of firmware updates to change product behavior and invalidate previously collected data on reliability.
  1119. Open-source SSD firmware would provide more transparency on performance and reliability.
  1120.  
  1121. reply
  1122.  
  1123.        
  1124. CharlesW 5 hours ago | root | parent | next [–]
  1125.  
  1126. > Open-source SSD firmware would provide more transparency on performance and reliability.
  1127. This seems fantastic. Are you saying you could review the firmware source and know that the 980 Pro would lose ~1% of its endurance per week?
  1128.  
  1129. reply
  1130.  
  1131.        
  1132. joenathanone 5 hours ago | parent | prev | next [–]
  1133.  
  1134. Lifetime warranties used to be commonplace, I wish we could return to those times, or at least to a time of repairability.
  1135. reply
  1136.  
  1137.        
  1138. nightfly 4 hours ago | root | parent | next [–]
  1139.  
  1140. Lifetime warranty on a consumable product (SSDs have a limited number of writes) doesn't seem reasonable.
  1141. reply
  1142.  
  1143.        
  1144. joenathanone 3 hours ago | root | parent | next [–]
  1145.  
  1146. True, in my perfect world I would settle for a trade in program, you would get some value for the failed unit so that you can upgrade and the OEM could recycle the raw materials. If we will ever live in a sustainable society we are going to need repairability and recycling programs for all consumer products.
  1147. reply
  1148.  
  1149.        
  1150. paulmd 3 hours ago | root | parent | prev | next [–]
  1151.  
  1152. using up the flash doesn't hurt the controller, though. the controller still knows how much writing it's done even if the flash itself is toast, it's a totally different part of the drive.
  1153. And even still, you could construct the controller so that it was burning e-fuses to indicate lifespan and the fuses could be readable through JTAG, short of complete controller death or lightning-strike level surges (which you can legitimately argue as being abuse and not warrantyable) you could make it offline-readable from an external device.
  1154.  
  1155. https://en.wikipedia.org/wiki/EFuse
  1156.  
  1157. The problems here are primarily economic/social, not technical. Companies don't want to hold warranty liability on their books for 10+ years, but they also don't really want to accept returns for defective products or other things either, and we make them do it anyway.
  1158.  
  1159. The EU is already pushing warranties to a minimum of two years for exactly this reason. Could it be 5 years, or 10 years? Sure, why not.
  1160.  
  1161. Companies will scream in the short term, of course. It's cheaper for them to push out crap that'll die and be in the trash in 3 years. Engineering products for longer lifespans would be a shift in engineering/design mindset. It probably would also push minimum device costs upwards at least a little bit, but, that's not a bad thing either - the slogan is "reduce, reuse, recycle", in that order, and "reduce" there means simply buy less or buy things that last longer. A shift away from planned obsolescence isn't the worst thing culturally, we don't want to encourage design-for-disposability.
  1162.  
  1163. Especially as Moore's Law slows, hardware is relevant for longer and longer periods of time. For example, a lot of people are finding that their GPUs are dying before they're actually irrelevant as hardware. It's not just NVIDIA who had bumpgate, a ton of hardware from that era failed over time due to faulty solder and probably could have been fixed with an hour of a tech's work.
  1164.  
  1165. Even worse, they're often dropped from support. There's really nothing wrong with a R9 290X as a GPU, but AMD won't support it with software anymore, despite the fact that it basically works anyway and it's pretty much purely a software lockout (which third parties have hacked and bypassed), because they want you to buy the new one. Wouldn't it be nice if GPUs were just expected to work for 10 years from purchase and that was covered by warranty and software support?
  1166.  
  1167. There are an increasing number of people who do hang onto hardware for 5-10 years because the relevant lifespan is getting longer and longer, and we should encourage that and require companies to support those consumption patterns. Just like not gluing together phones to make the battery irreplaceable, we really should be making sure electronics bumpouts don't fail in 3-5 years and that companies don't dump-and-run on the software.
  1168.  
  1169. Routers are another one where the software support is just egregious, too. How many rando Linksys or TP-Link or whatever actually get an update when a bunch of new vulnerabilities in WPA or whatever are discovered? Not that many, and "just install OpenWRT" is not a society-level answer especially when companies are locking down hardware.
  1170.  
  1171. reply
  1172.  
  1173.        
  1174. thfuran 4 hours ago | root | parent | prev | next [–]
  1175.  
  1176. It also used to be the case that a computer was basically ewaste within two or three years because a new one would be ten times faster.
  1177. reply
  1178.  
  1179.        
  1180. joenathanone 4 hours ago | root | parent | next [–]
  1181.  
  1182. Growing up poor, I was always a few generations behind, rocking a 486 DX2 when the PII & PIII where the latest and greatest. 33kbps modem when others had 56k. When I was 10ish Me and my older brother would go to the thrift store and dig through the computer parts, it was an adventure.
  1183. reply
  1184.  
  1185.        
  1186. paulmd 3 hours ago | root | parent | next [–]
  1187.  
  1188. > When I was 10ish Me and my older brother would go to the thrift store and dig through the computer parts, it was an adventure.
  1189. I miss that too. Thrift stores suck now, they're pulling all the good clothes out and selling them to upcyclers and pulling all the cool electronics and cameras and other stuff and selling them on ShopGoodwill and ebay.
  1190.  
  1191. And ShopGoodwill is pretty absurd, almost everything is sold as-is and uninspected, and prices are just as high as ebay if not sometimes higher.
  1192.  
  1193. The days of wandering through a goodwill and finding some neat stuff at a bargain price are gone now, unfortunately.
  1194.  
  1195. reply
  1196.  
  1197.        
  1198. TacticalCoder 5 hours ago | parent | prev | next [–]
  1199.  
  1200. Back when HDD would fail really a lot warranty was working. I'd happily fill an online form, Web 1.0 style, and then send my Seagate (I'm in Europe, was sending them to the Netherlands IIRC) disks and a few weeks later I'd receive a new drive.
  1201. I probably still have a few screenshots of these forms somewhere.
  1202.  
  1203. reply
  1204.  
  1205.        
  1206. h2odragon 3 hours ago | parent | prev | next [–]
  1207.  
  1208. Perhaps an insurance agent can craft a policy to do that for you.
  1209. Failing that, maybe a bookmaker.
  1210.  
  1211. reply
  1212.  
  1213.        
  1214. Spooky23 4 hours ago | parent | prev | next [–]
  1215.  
  1216. HPE does that for enterprise disks. But it ain’t free!
  1217. reply
  1218.  
  1219.        
  1220. jeffbee 5 hours ago | parent | prev [–]
  1221.  
  1222. I am not sure why you want a 10x refund, but it seems like your request is easily met by current warrantees. A 1TB WD SN850X advertises 1200TBW endurance, rather more than you require.
  1223. reply
  1224.  
  1225.        
  1226. paulmd 2 hours ago | root | parent | next [–]
  1227.  
  1228. https://www.law.cornell.edu/wex/punitive_damages
  1229. Seems clear the idea is to make sure that companies err well on the side of lifespan rather than designing something that fails a month after the warranty expires. Because if they're cutting it close, a decent number of units are going to fall under the warranty line and they'll be liable.
  1230.  
  1231. Even if a company is required to stand behind the product, a lot of consumers won't pursue it if it's not perceived to be worth the trouble. Do you care about the 120GB drive you bought in 2012? Not really. Do you care if you can get 10x the original ($1/gb) purchase price for it? Sure, $1200 is worth my trouble.
  1232.  
  1233. As they say - "A times B times C, if that's less than X, the cost of a recall, we don't do one".
  1234.  
  1235. I'm not OP and am not gonna die on this hill as a point of policy, but if 9/10 consumers just shrug their shoulders and accept that their 8yr old drive has failed and throw it in the garbage, that's still a bad thing at a society-wide level where you want people to be using hardware for longer and longer periods of time. Especially as moore's law tapers down even further and hardware becomes relevant for longer and longer periods of time - a R9 290X is still a pretty nice piece of hardware!
  1236.  
  1237. Michigan used to do something very similar with checkout price scanners - if the price coded in the system was more than advertised, you got 10 times the difference up to a limit. And the point was to get retailers to pay fucking attention because a 50 cent pricing error on a can of chili could cost them 5 bucks. Punitive damages, with citizens who spot the violations receiving the bounty.
  1238.  
  1239. https://www.canr.msu.edu/news/michigan_changed_item_pricing_...
  1240.  
  1241. reply
  1242.  
  1243.        
  1244. dataflow 3 hours ago | root | parent | prev [–]
  1245.  
  1246. The SN850x seems to have its own issues from what I read (just google it).
  1247. reply