Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

katy ✨

ill use copilot in place of most of the times ive searched on stackoverflow or to do mundane things like generate repeated things but relying solely on it is the same as relying solely on stackoverflow.

crossmr

The best method I’ve found for using it is to help you with languages you may have lost familiarity in and to walk it through what you need step by step. This lets you evaluate it’s reasoning. When it gets stuck in a loop:

Try A!
Actually A doesn’t work because that method doesn’t exist.
Oh sorry Try B!
Yeah B doesn’t work either.
You’re right, so sorry about that, Try A!
Yeah… we just did this.

at that point it’s time to just close it down and try another AI.

@[email protected]

Probably more than 52% of what programmers type is wrong too

@[email protected]

We mostly suck in emails.

lurch (he/him)

The one time it was helpful at work was when I used it to thank and wish a person well that left a company we work with. I couldn’t come up with a good response and ChatGPT just spat real good stuff out in seconds. This is what it’s really good for.

@[email protected]

Yeah things that follow a kind of lexical “script” that you don’t want to get creative with would be pretty easy to generate. Farewells, greetings, dear Johns, may he rest in peaces, etc etc

@[email protected]

ChatGPT: I’m happy for you though, Or sorry that happened

@[email protected]

ChatGPT just spat real good stuff out in seconds

There’s an entire episode of south park centered around this premise.

haui

The interesting bit for me is that if you ask a rando some programming questions they will be 99% wrong on average I think.

Stack overflow still makes more sense though.

@[email protected]

I worked for a year developing in Magento 2 (an open source e-commerce suite which was later bought up by Adobe, it is not well maintained and it just all around not nice to work with). I tried to ask some Magento 2 questions to ChatGPT to figure out some solutions to my problems but clearly the only data it was trained with was a lot of really bad solutions from forum posts.

The solutions did kinda work some of the times but the way it was suggesting it was absolutely horrifying. We’re talking opening so many vulnerabilites, breaking many parts of the suite as a whole or just editing database tables. If you do not know enough about the tools you are working with implementing solutions from ChatGPT can be disasterous, even if they end up working.

@[email protected]

I would make some 1000 monkeys with typewriters comment, but I see what most actual contracted devs produce…

THCDenton

It was pretty good for a while! They lowered the power of it like immortan joe. Do not be come addicted to AI

dullbananas (Joseph Silva)

If you become addicted to ChatGPT then that makes you a cloud cyborg

@[email protected]

What’s especially troubling is that many human programmers seem to prefer the ChatGPT answers. The Purdue researchers polled 12 programmers — admittedly a small sample size — and found they preferred ChatGPT at a rate of 35 percent and didn’t catch AI-generated mistakes at 39 percent.

Why is this happening? It might just be that ChatGPT is more polite than people online.

It’s probably more because you can ask it your exact question (not just search for something more or less similar) and it will at least give you a lead that you can use to discover the answer, even if it doesn’t give you a perfect answer.

Also, who does a survey of 12 people and publishes the results? Is that normal?

@[email protected]

I have 13 friends who are researchers and they publish surveys like that all the time.

(You can trust this comment because I peer reviewed it.)

@[email protected]

Even this Lemmy thread has more participants than the survey

Max-P

I don’t even bother trying with AI, it’s not been helpful to me a single time despite multiple attempts. That’s a 0% success rate for me.

@[email protected]

For someone doing a study on LLM they don’t seem to know much about LLMs.

They don’t even mention which model was used…

Here’s the study used for this clickbait garbage :

https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

@[email protected]

I’m a 10 year pro, and I’ve changed my workflows completely to include both chatgpt and copilot. I have found that for the mundane, simple, common patterns copilot’s accuracy is close to 9/10 correct, especially in my well maintained repos.

It seems like the accuracy of simple answers is directly proportional to the precision of my function and variable names.

I haven’t typed a full for loop in a year thanks to copilot, I treat it like an intent autocomplete.

Chatgpt on the other hand is remarkably useful for super well laid out questions, again with extreme precision in the terms you lay out. It has helped me in greenfield development with unique and insightful methodologies to accomplish tasks that would normally require extensive documentation searching.

Anyone who claims llms are a nothingburger is frankly wrong, with the right guidance my output has increased dramatically and my error rate has dropped slightly. I used to be able to put out about 1000 quality lines of change in a day (a poor metric, but a useful one) and my output has expanded to at least double that using the tools we have today.

Are LLMs miraculous? No, but they are incredibly powerful tools in the right hands.

Don’t throw out the baby with the bathwater.

sylver_dragon

I think AI is good with giving answers to well defined problems. The issue is that companies keep trying to throw it at poorly defined problems and the results are less useful. I work in the cybersecurity space and you can’t swing a dead cat without hitting a vendor talking about AI in their products. It’s the new, big marketing buzzword. The problem is that finding the bad stuff on a network is not a well defined problem. So instead, you get the unsupervised models faffing about, generating tons and tons of false positives. The only useful implementations of AI I’ve seen in these tools actually mirrors you own: they can be scary good at generating data queries from natural language prompts. Which is, once again, a well defined problem.

Overall, AI is a tool and used in the right way, it’s useful. It gets a bad rap because companies keep using it in bad ways and the end result can be worse than not having it at all.

@[email protected]

In fairness, it’s possible that if 100 companies try seemingly bad ideas, 1 of them will turn out to be extremely profitable.

@[email protected]

On the other hand, using ChatGPT for your Lemmy comments sticks out like a sore thumb

FaceDeer

If you’re careless with your prompting, sure. The “default style” of ChatGPT is widely known at this point. If you want it to sound different you’ll need to provide some context to tell it what you want it to sound like.

Or just use one of the many other LLMs out there to mix things up a bit. When I’m brainstorming I usually use Chatbot Arena to bounce ideas around, it’s a page where you can send a prompt to two randomly-selected LLMs and then by voting on which gave a better response you help rank them on a leaderboard. This way I get to run my prompts through a lot of variety.

@[email protected]

Omg, I feel sorry for the people cleaning up after those codebases later. Maintaing that kind of careless “quality” lines of code is going to be a job for actual veterans.

And when we’re all retired or dead, the whole world will be a pile of alien artifacts from a time when people were still able to figure stuff out, and llms will still be ridiculously inefficient for precise tasks, just like today.

https://youtu.be/dDUC-LqVrPU

@[email protected]

Here is an alternative Piped link(s):

https://piped.video/dDUC-LqVrPU

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

@[email protected]

Anyone who claims llms are a nothingburger is frankly wrong,

Exactly. When someone says that it either indicates to me that they ignorant (like they aren’t a programmer or haven’t used it) or they are a programmer who has used it, but are not good at all at integrating new tools into their development process.

Don’t throw out the baby with the bathwater.

Yup. The problem I see now is that every mistake an ai makes is parroted over and over here and held up as an example of why the tech is garbage. But it’s cherry picking. Yes, they make mistakes, I often scratch my head at the ai results from Google and know to double check it. But the number of times it has pointed me in the right direction way faster than search results has shown to me already how useful it is.

@[email protected]

Refreshing to see a reasonable response to coding with AI. Never used chatgpt for it but my copilot experience mirrors yours.

I find it shocking how many developers seem to think so many negative thoughts about it programming with AI. Some guy recently said “everyone in my shop finds it useless”. Hard for me to believe they actually tried copilot if they think that

@[email protected]

I’m a 10 year pro,

You wish. The sheer idea of calling yourself a “pro” disqualifies you. People who actually code and know what they are doing wouldn’t dream of giving themselves a label beyond “coder” / “programmer” / “SW Dev”. Because they don’t have to. You are a muppet.

@[email protected]

elon?

chiisana

Here we observe a pro gatekeeper in their natural habitat…

@[email protected]

Hey! So you may have noticed that you got downvoted into oblivion here. It is because of the unnecessary amount of negativity in your comment.

In communication, there are two parts - how it is delivered, and how it is received. In this interaction, you clearly stated your point: giving yourself the title of pro oftentimes means the person is not a pro.

What they received, however, is far different. They received: ugh this sweaty asshole is gatekeeping coding.

If your goal was to convince this person not to call themselves a pro going forward, this may have been a failed communication event.

@[email protected]

while your measured response is appreciated, I hardly consider a few dozen downvotes relevant, nor do I care in this case. It’s telling that those who did respond to my comment seem to assume I would consider myself a “pro” when that’s 1) nothing I said and 2) it should be clear from my comment that I consider the expression cringy. Outside memeable content, only idiots call themselves a “pro”. If something is my profession, I could see someone calling themselves a “professional <whatever>” (not that I would use it), but professional has a profoundly distinct ring to it, because it also refers to a code of conduct / a way to conduct business.

“I’m a pro” and anything like it is just hot air coming from bullshitters who are mostly responsible for enshittification of any given technology.

@[email protected]

A lot of rage for a small amount of confidence

@[email protected]

I’ve found that the better I’ve gotten at writing prompts and giving enough information for it to not hallucinate, the better answers I get. It has to be treated as what it is, a calculator that can talk, make sure it has all of the information and it will find the answer.

One thing I have found to be super helpful with GPT4o is the ability to give it full API pages so it can update and familiarise it’s self with what it’s working with.

@[email protected]

As a fellow pro, who has no issues calling myself a pro, because I am…

You’re spot on.

The stuff most people think AI is going to do - it’s not.

But as an insanely convenient auto-complete, modern LLMs absolutely shine!

Melkath

Developing with ChatGPT feels bizzarely like when Tony Stark invented a new element with Jarvis’ assistance.

It’s a prolonged back and forth, and you need to point out the AIs mistakes and work through a ton of iterations to get something that is close enough that you can tweak it and use, but it’s SO much faster than trawling through Stack Overflow or hoping someone who knows more than you can answer a post for you.

elgordio

Yeah if you treat it is a junior engineer, with the ability to instantly research a topic, and are prepared to engage in a conversation to work toward a working answer, then it can work extremely well.

Some of the best outcomes I’ve had have needed 20+ prompts, but I still arrived at a solution faster than any other method.

Melkath

In the end, there is this great fear of “the AI is going to fully replace us developers” and the reality is that while that may be a possibility one day, it wont be any day soon.

You still need people with deep technical knowledge to pilot the AI and drive it to an implemented solution.

AI isnt the end of the industry, it has just greatly sped up the industry.

@[email protected]

Worth noting this study was done on gpt 3.5, 4 is leagues better than 3.5. I’d be interested to see how this number has changed

@[email protected]

There is huge gap between 3.5 and 4 especially in coding related questions. GPT3.5 does not have large enough token size to handle harder code related questions.

@[email protected]

4 made up functions that didn’t exist last time I asked in a programming question.

FaceDeer

This is why I like Bing Chat for this kind of thing, it does a web search in the background and will often be working right from the API documentation.

@[email protected]

sure, I’m not saying GPT4 is perfect, just that it’s known to be a lot better than 3.5. Kinda why I would be interested to see how much better it actually is.

Ech

For the upteenth time - an llm just puts words together, it isn’t a magic answer machine.

@[email protected]

Yeah but it’s just going to get better at magicking. Soon all us wizards will be out of a job…

@[email protected]

Just as soon as we no longer need to drive.

chiisana

Self driving cars need to convince regulators that they’re safe enough, even if assuming they master the tech.

LLMs has already convinced our bosses that we are expendable, and can drastically reduce cost centres for their next earnings call.

Naminreb

A parrot blabbing the theory of relativity doesn’t make it Einstein.

@[email protected]

Ask “are you sure?” and it will apologize right away.

Lemongrab

And then agree with whatever you said, even if it was wrong.

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrongplus-square

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrongplus-square

Technology

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong