You know, I miss the days when we were trying to build "better" software that solved actual problems.
I'm trying to automate a few manual processes we have right now, but I still can't get over this hump. What am I doing wrong?
I am using these AI APIs for actual processing type work, and I am left defeated and somewhat angry if I'm being honest. These AI companies sell us some galaxy-brain vision of automation, but actually using their services is a disappointing experience.
1. The results are never consistent. "Please ensure you extract ALL items" -> [Item1, Item2, Item3, "literally a comment // ...remaining items"] WHAT THE F$#K!! Sometimes it gives me a full list of all items, and sometimes it does that BS. I provided a tool, and half of the time it just grabs the first 3 and maybe it will grab the very last one too (ignoring everything in the middle).
2. Because the results are not reliable, I have to do more post-processing. About 60% of the time, even after post, I have to reject because they don't meet my confidence threshold.
3. The APIs are poorly supported by the vendors.
- iOS has some insane behavior where file extensions are sometimes .jpg or .JPG, etc. OpenAI's API, for example, will return Bad Request because the extension was not ".jpg" so now I have to add more code to ensure that when the user uploads files, I rename the file.
- The docs will say it supports a list of file formats, but then rejects the request because it was not .PDF even though the purpose was "assistants" (which the docs say can handle images). No problem, I'll just convert..
- Dealing with files coming from other sources (G Drive, etc.) where the extension is missing but the MIME type is present.. Again, bad request.
4. We went from "AGI any day now" in 2024, to "_A_rtificial _S_uper _I_ntelligence any day now" today. Can we just relax? Did I fall for a marketing trap?
I think LLMs are great for applications like in Cursor, or for customer support, where it doesn't need to give "perfect" responses because a human operator will prompt it further. How many times have you had to deal with stupid output from Cursor (I'm a power user, I deal with this daily). RAG is a cool application, and there's no real need for correctness or exactness there, IMO. I've got hundreds of my notes that I've fed which I reference sometimes. I get different answers each time, but I don't need them to be perfect.
:q!