For the AI brutes like yours truly, a LLM is only two things: its weights (which is what comes out of its training) and the context you provide to it so that you get a good answer. The context is the question you ask to the model, and accompanying data you present. Weights + prompt = answer. When you get an answer, the answer serves as subsequent context for the next question.
Some new things, however, have popped up, like RAG and MCP. I decided to spend a bit of time to understand them and place them in the two part framework for dummies I showed above.
RAG (retrieval augmented generation) is not about weights, but about context. RAG is a bunch of mechanisms for retrieving context that can improve the answer. Some types of RAG are done by deterministic systems (think a backend bringing data from a DB to put it as extra context on the question). LLMs can also retrieve data from certain sources. Whether it is done for the LLM, or the LLM does it, RAG is about bringing in more context. It does not change the weights and it does not retrain the model.
What about embeddings? This one scared me. It turns out, when you’re doing RAG for the LLM (instead of the LLM doing the searches for itself), you don’t just want to do exact match searches. If a user is searching for “app is slow”, you want to find the context for “performance optimization”. How to do this match without asking the LLM to read your entire reference context? You run the query through another model with its own weights (which are just for embedding purposes) and get embeddings. These embeddings allow you to map queries like “app is slow” to “performance optimization”. Embeddings are therefore using some embedding’s model weights (without changing them), just to point the RAG mechanism to the right context.
What about tools? Tools are ways in which LLMs can get input and produce output. Without tools, LLMs can just receive as context what you ask them, and they can only respond with text. With tools, LLMs can fetch data; they can also “do” things (think DB, OS and network calls; also calling deterministic code) instead of just telling you what to do. MCP is just an emerging protocol for standarizing LLM tools.
Can you change the weights of an existing, closed-source model? Turns out you can with fine-tuning. You can present more data to the model so its weights are changed. I’ve never done it, but it’s possible. That modification would just be for you. You don’t have access to the actual weights, just to the results that the fine-tuned model gives you.
If there’s a significant error above, please let me know!