Large Language Models

4 days ago 1

To say that "machine learning" has been a hot topic lately would be an understatement. Specifically, the subtopic of "large language models" has taken the world by storm, for better or worse, and for some those two terms are one and the same. For this post I'm specifically talking about "large language models" (LLMs) since that's the only topic I feel confident enough to speak on when it comes to machine learning.

Some of what I say may be wrong, or should be taken with a grain of salt. I am by no means an expert. I am also not a complete fandog of LLMs. Also, I do not endorse the use of machine learning for generating "art", "stories", or other slop.

Upfront, I'll admit, I use LLMs from time to time, but far less often than some may think. A lot of my usage is around the interest of seeing them evolve, with new training or quantization methods, and I haven't been able to find a good use for them in day to day life beyond better code completion, which I will touch on later. I'm aware this differs from person to person, and if you do find LLMs to be a useful tool, then fantastic! At the end of the day, I do see them as tools, a somewhat lossy way of searching for information. I don't imbue them with some personality - the models are there for me to retrieve information with "natural language", often as a rubber ducky to trigger further investigation using traditional tools. I do not trust their output, as it can be wrong (often), but it is a mathematically average response to my query.

And that's where I stand with LLMs. They're a mathematical formula for finding the connections between words and stringing together sentences based on averages. This in itself is not inherently good or evil, but the way they are constructed and run can be very unethical. And indeed, many of the ways they're being adopted or shoehorned into products is straight up wrong.

On the topic of training, or the process of creating these models. There's a fair number of justified opponents to this, from writers having their own consumed as little more than "content" in the eyes of the machines, to system administrators having resources eaten up by aggressive crawlers. Generally this can be summed up as companies not understanding the concept of consent, which is a much larger issue within the technology sector. If I had a dime for every time I saw a "yes / later" prompt, I would be a very, very rich dog. I've written about my thoughts around companies scooping up every bit of data on the internet to train their models before, so I'm deliberately shifting the conversation towards consent in this instance.

I do think there are ways to train models without causing issues. These companies just need to learn about the concept of "opt-in" consent, rather than DDoSing websites and stealing the hard work of others (Note: I am deliberately avoiding the use of the term "content". People put a lot of effort into their work, and boiling it all down to generic "content" makes it seem less valuable). I also believe that if opt-in was more prevalent, the tooling for exposing data to train models would get better and lessen the load on both sysadmins and crawlers alike. But I do not believe that this will happen anytime soon.

Selling access to large language models, with how training is done in mind, really gets to me. They will happily ingest an incalculable amount of human effort, distill it into a black box, then happily sell it back to you, often for an insane price (although happily, it seems, not for a profit!). I gravitate towards using local models, which run on my own hardware and are completely free to use. These are often called "open source models", but I hesitate to use that term as one of the keys to these models, the training data itself, is not shared. While the training done is still antithetical to my beliefs, it is at least possible to use them without giving money to the company. Although these models tend to be smaller, they contain more than enough data to be usable (at least for my small use case) and require far fewer resources than whatever the huge models need to run, and as a plus you aren't providing even more free training data to the backing companies.

If your company isn't adding large language models to their product they probably will be soon. It's the big hype train to be on, with millions (if not billions) of venture capital funding being thrown around at the mere mention of it. But holy shit, are these LLMs being mis-applied. An LLM can't be a senior software developer (or oftentimes even a junior developer), it can't be your best friend, and for the love of god it can't be your therapist. No matter how good your prompt or the training. I recently started poking around the /r/ChatGPT subreddit and while I was aware the models were being badly used, I never really paid attention to how individuals were using, or indeed personifying, these algorithms. To be clear - a large language model is not a person. It is not intelligent. Please, I beg you, stop treating it as such. I know that I'm probably going to be preaching to the choir here, so I'll leave it at that. I can't really blame companies for jumping onto the LLM hype train. It's easy money. But I am looking forward to the bubble bursting and LLMs not being shoveled into every product under the sun.

While on the topic of mis-using LLMs, let's briefly touch on "vibe coding". It's not something I've done myself, but I'm aware of its usage and have done a bit of research into the topic. For the uninitiated, vibe coding is essentially giving a large language model complete control over a codebase, with the user acting as more of a product manager than anything else. It's not uncommon for those doing this vibe coding to not care or review what's output, as long as it works. It should be pretty clear why this is a bad idea, if for nothing else than the fact there's a lot of terrible code on the internet that these models have hoovered up. A model has no way of knowing if code is "good" or "bad", just that these tokens are often arranged together in an order.

It's important to distinguish "pure" vibe coding an "LLM assisted code". Personally, I find LLMs quite decent at producing a reasonably context aware autocomplete for code, but still review the changes suggested with a critical eye. On the bright side, it's really helped me improve my review process for pull requests. I use a fairly small model for this task, so resource usage is very light. All the context I need is usually the current codebase I'm working on plus some for how the language is laid out, and that's good enough. I've found some of the larger local models are pretty okay at HTML and CSS as well, and being pretty bad at coming up with designs myself being able to quickly generate an average looking frontend for something is fairly useful.

Do I believe this could be done without LLMs? Probably. Static analysis and projects like treesitter open up a lot of possibilities. I won't pretend it's the optimal solution.

At the end of the day, I don't hate LLMs or advocate for their complete destruction. But if the inherint problems with creating them can't (or won't) be solved, then I hope they go away. It's been difficult to admit this, as within my own social circles LLMs are very disliked, if not outright hated. I've deliberately avoided talking about LLMs in any potentially favourable light, not just because of the hefty ethical concerns behind them but also because I've been unsure how others would react. The topic does have nuance, and is worth discussing properly, not in small snippets on "microblogging" platforms.

I'm more than happy to chat about this or answer questions over on the fediverse, and you're more than welcome to ask to add me on platforms that are better suited for longer form conversations!

Read Entire Article