bobthehobbyguy Posted July 19, 2024 Posted July 19, 2024 Whether it's incompetence or malevolence the end result is the same. Pure chaos.
Rob Hall Posted July 19, 2024 Posted July 19, 2024 Several third party integrations my company has are experiencing outages. Windows users in my company are getting BSODs trying to login. My team is fine, we are all on Macs. I'm on PTO today but reading messages in Slack from my team.
Bainford Posted July 20, 2024 Posted July 20, 2024 Maybe I'm just an old curmudgeon, or perhaps I'm a Luddite at heart, but I love it when this stuff happens. 1 1
Dave Ambrose Posted July 20, 2024 Posted July 20, 2024 I think a few people are going to get yelled at; both at Microsoft and Crowdstrike. Crowdstrike produced a defective update, but Microsoft was supposed to verify and authenticate the patch. I wouldn't fire them though. It will be hard to find anyone else with such a fine appreciation for the consequences of a mistake like this. 2 1
stitchdup Posted July 20, 2024 Posted July 20, 2024 who'd a thunk that everybody using the same anti virus program could be a problem? well apart from the guy on reddit that posted a conspiracy theory about crowdstrike on thursday night about 2 hours before it crashed and did everything he said it would. this just proves that monopolies are not a good thing cos this stuff can happen. variety is the spice of life after all 1
MeatMan Posted July 20, 2024 Posted July 20, 2024 Its kinda like paint models, kinda. If you're going to use a new paint, test it first! When I was a warehouse analyst, and IT was issuing an upgrade or patch, they put it on a separate system called a "sandbox". Then we tested it until it broke or didn't. I'd hazard to guess that they don't do that anymore because of "resource constraints". Fancy words for reductions in force prompted by a desire to reduce overhead. Plus, they use AI for a lot of coding now, and AI never makes mistakes! ? 2
James2 Posted July 20, 2024 Posted July 20, 2024 My wife and I are paid through Paycor, turns out our local bank was affected. No deposits made on Friday (payday) and to further enhance the experience the bank is on a scheduled blackout while they update their system. No branches open, no drive thru, no ATM and limited or no debit card usage. Just a gentle reminder why we need cash in hand. Let the conspiracies begin... 1 1
Ace-Garageguy Posted July 20, 2024 Author Posted July 20, 2024 1 minute ago, James2 said: ...Just a gentle reminder why we need cash in hand. Exactly. But there are millions who won't heed the message. Stuff happens, whether deliberate or through incompetence. Best to be prepared. 1
Dave Van Posted July 20, 2024 Posted July 20, 2024 My tech job had ZERO tolerance for failure. Why we tested over and over before any updates 2
bobthehobbyguy Posted July 20, 2024 Posted July 20, 2024 7 hours ago, Dave Ambrose said: I think a few people are going to get yelled at; both at Microsoft and Crowdstrike. Crowdstrike produced a defective update, but Microsoft was supposed to verify and authenticate the patch. I wouldn't fire them though. It will be hard to find anyone else with such a fine appreciation for the consequences of a mistake like this. Exactly. Been looking at a couple of analysis about this issue. It's going to take several weeks to fix this. Becuase of the nature of the problem the fix will need to be implemented manually.
Rob Hall Posted July 20, 2024 Posted July 20, 2024 After almost 3 decades in software development, I’ve seen all sorts of standards and procedures for production deployment. So many places have limited or no testing/qa environments and put things in production that may be buggy or poorly tested, even big Fortune 50 companies do it. 2
thatz4u Posted July 20, 2024 Posted July 20, 2024 Maybe the software problem happened so that AI could take over the web completely, Sarah Conner was right... 1 2
bobthehobbyguy Posted July 20, 2024 Posted July 20, 2024 24 minutes ago, Rob Hall said: After almost 3 decades in software development, I’ve seen all sorts of standards and procedures for production deployment. So many places have limited or no testing/qa environments and put things in production that may be buggy or poorly tested, even big Fortune 50 companies do it. Yup. Hey what possibly could go wrong... I worked for an outfit that did laser printers. We had an issue that there was no check to see if the update file had been properly loaded to the printer. We bricked a few printers and then the download had a check to see if the file was good.
Ace-Garageguy Posted July 20, 2024 Author Posted July 20, 2024 (edited) 1 hour ago, Rob Hall said: After almost 3 decades in software development, I’ve seen all sorts of standards and procedures for production deployment. So many places have limited or no testing/qa environments and put things in production that may be buggy or poorly tested, even big Fortune 50 companies do it. Unfortunately, the trend in hardware engineering is towards "zero prototypes", where items, even complex and highly-stressed things like automatic transmissions and engines, would be "tested" in simulation and then taken directly to production with no physical prototypes ever. What could possibly go wrong? EDIT: Actually, it would probably be just fine if the simulation software and metrics were all perfect. But as this latest incident proves conclusively, GIGO is alive and well. The early failures and particular failure modes of many of today's engines and gearboxes should be a red-flag to manufacturers that there need to be more greasy hands in the automotive development process, and more stringent real-world testing procedures. Oh well. It's not my dog, and every time I bring something like this up as it applies to vehicles, I'm shouted down by the "how much better everything today is" crowd. Edited July 20, 2024 by Ace-Garageguy 1
bobthehobbyguy Posted July 20, 2024 Posted July 20, 2024 The biggest problem these days is the attitude that if it doesn't work we can fix it later. However in this case the effort to get it right is a huge maginude less effort than implementing the fix. It reminds me of an old adage "act in haste and repent at leisure ". 1
HomerS Posted July 21, 2024 Posted July 21, 2024 I'm envisioning the South Park episode of all the 'experts' standing around and Kyle resets the Internet. 3
Dave Ambrose Posted July 22, 2024 Posted July 22, 2024 So, was reading this afternoon that McAfee had a similar problem some years ago. Oddly enough the current CEO of Crowdstrike was CTO of McAfee at the time that all went down. Guess they didn't learn anything at McAfee. Also long rant about companies cutting personnel until they don't function very well. 3
bobss396 Posted July 22, 2024 Posted July 22, 2024 On 7/20/2024 at 10:09 AM, thatz4u said: Maybe the software problem happened so that AI could take over the web completely, Sarah Conner was right... Or the Unabomber was right. "Eventually our technology will fail us". 2
bobthehobbyguy Posted July 23, 2024 Posted July 23, 2024 Interesting article. https://www.msnbc.com/opinion/msnbc-opinion/crowdstrike-outage-safety-update-rcna163127 1
Dave Ambrose Posted July 24, 2024 Posted July 24, 2024 11 hours ago, bobthehobbyguy said: Interesting article. https://www.msnbc.com/opinion/msnbc-opinion/crowdstrike-outage-safety-update-rcna163127 I have mixed feelings about this article. First off, it comes from Microsoft, one of the active agents in creating the Crowdstrike debacle. That's a major conflict of interest. Secondly, they characterize the underlying problem as small. Ahem, catastrophic failure of kernel level code is never a small error. Third party analysis of the code indicates that the problem is triggered by a combination of poor coding and a possible data corruption. You need a special kind of engineer for this project. You need one that relies more on design, and less on testing to ensure correctness. It's a "do it right the first time" attitude that you don't see very often in any generation. They tend to look less productive than sloppier coders, so they tend to not get promoted, and leave for greener pastures. 2 1
bobthehobbyguy Posted July 24, 2024 Posted July 24, 2024 I agree Dave. Any issue with critical code like this can be disastrous. However an interesting point was that there were companies that were not directly affected by the issues but had other vendors who were with the end result being that were affected.
Ace-Garageguy Posted July 24, 2024 Author Posted July 24, 2024 12 hours ago, Dave Ambrose said: ...It's a "do it right the first time" attitude that you don't see very often in any generation... Agreed 100%. Get it out the door, get paid, worry about whether it's right or not later if at all.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now