Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

leninmummy@lemmy.ml · 1 year ago

Over just a few months, ChatGPT went from correctly answering a simple math problem 98% of the time to just 2%, study finds

Fixbeat@lemmy.ml · edit-2 3 days ago

deleted by creator

TheSaneWriter@lemmy.thesanewriter.com · 1 year ago

It can probably still write boilerplate code, but I wouldn’t currently trust it for algorithmic design.

remotedev@lemmy.ca · 1 year ago

I’ve tried to use it for debugging by copying code into it, and it gives me the same code back as the corrected version. I was wondering why it’s been getting worse

TheSaneWriter@lemmy.thesanewriter.com · 1 year ago

My guess is they’ve been trying to make it cheaper by decreasing the amount of time it spends on each response or by decreasing the amount of computing power that goes into the instance you’re speaking to. Coding and math are products of high-level cognition and arise emergently out of neural networks that are very sophisticated, but take just a bit of power out and the abilities degenerate rapidly.

agissilver@lemmy.world · 1 year ago

I also experienced this issue last week. I asked for a specific correction and got unchanged code back. Sometimes it does update, though. Maybe like 50-70% of requests.

EmilieEvans@lemmy.ml · 1 year ago

Tried basic embedded tasks a week ago: Complete trainwreck.

From using I2C to read out the internal temperature sensor on a Puya F030 (retested with an STM MCU and AVR: same answer but F030 replaced by STM32F103 within the code) to calling the WCH CH32V307 made by STM utilizing ARM M4.

After telling it to not use I2C it gave a different answer. Once more gibberish that looked like code.

What made this entirely embarrassing all a human would need to solve the question would be copy-pasting the question into Google and clicking the first link to the manufacturer example project/code hosted on GitHub.

SokathHisEyesOpen@lemmy.ml · 1 year ago

Today it randomly decided to hide the results from some code that was supposed to be returned from a function. I asked it why it chose to hide the results and it couldn’t tell me, it just apologized and then gave me the code without the hide logic. Pretty strange actually since we had been working on the code for half an hour and then all of the sudden it just decided to hide it all on its own.

SokathHisEyesOpen@lemmy.ml · 1 year ago

Yes! I use it at work almost every day. Sometimes it takes longer to get it to solve the problem than it would have taken me to write it, since it makes mistakes, but sometimes it saves me hours of coding and thinking. It is very helpful in debugging error codes and stuff like that since it can evaluate an entire 1000 line script file in half a second.

StarkillerX42@lemmy.ml · 1 year ago

I’ve never been able to get a solution that was even remotely correct. Granted, most of the times I ask ChatGPT is when I’m having a hard time solving it myself.

SokathHisEyesOpen@lemmy.ml · 1 year ago

You need to be able to clearly describe the problem, and your expected solution, to get it to give quality answers. Type out instructions for it like you would type for a junior developer. It’ll give you senior level code back, but it absolutely needs clear and constrained guidelines.

exscape@kbin.social · edit-2 1 year ago

I mostly agree, I’ve had good results with similar prompts, but there’s usually some mistake in there. It seems particularly bad with python imports, it just uses class A, B, C and imports class A, B and X and calls it a day.

Here are a few prompts that gave pretty good results:

Create a QDialog class that can be used as a modal dialog. The dialog should update itself every 500 ms to call a supplied function, and show the result of the call as a centered QLabel.

How can I make a QDialog move when the user clicks and drags anywhere inside it? The QDialog only contains two QLabel widgets.

For this one, it ignored the method I asked it to use – but it was possibly correct in doing so, as it doesn’t support arbitrary sizes (but I think that’s only for the request?):

Hi again! Can you write me a Python function (using PySide) to connect to a named pipe server on Windows? It should use SetNamedPipeHandleState to use PIPE_READMODE_MESSAGE, then TransactNamedPipe to send a request (from a method parameter) to a named pipe, then read back a response of arbitrary size.

It should have told me why it ignored using TransactNamedPipe, but when I told it that it ignored my request it explained why.