Doing the Ollama LangChain RAG
created: 2025/09/05

Summary

I am Implementing RAG in Python using LangChain and Ollama together with Retrievers and Vector Stores to provide ranked context to a local tool calling LLM.

Update

Here is a link to the code. LangChain RAG Demo I spent a little time cleaning up some things that were not great about the original.

It is still just a really simple demo, it does not do proper text-splitting based on document types, it does not handle html documents either for instance and many more things. When I have some time I will write something up about the code itself but for now you can read it yourself.

It was based on the tutorials from LangChain, like this one Build a Retrieval Augmented Generation (RAG) App: Part 2.

Vetting AI Tools

I have been testing and vetting a LOT of AI tools lately on a lot of platforms, and although there are a considerable number of frameworks and libraries available, the bulk of them are wrappers around services where you pay to do ___. While I can certainly understand that model, it doesn’t really help me, a poor unemployed developer, who is trying to build awesome AI powered tools.

There are also a lot of AI tools that look really promising but then suffer from common deployment issues. They are essentially just badly packaged or suffer from bad dependency management and those types of issues, which to be fair are sort of common in the land of python. (I know all about dependency hell, I used to write Perl.)

Funny enough I used to do DevOps and production releases and also used to be pretty good at reproducible environments but can’t find employment currently.. go figure.

Even among the free tools that you can actually install and run most of them under-deliver on features right now, or at least features I want out of the box.

I recently saw this post from Continue showing their new CLI tool, I had really high hopes for it since the Code integration works great and it’s use of yaml configs to define prompts, rules, models, agents and context is great. I agree with a lof of their ideas from amplified.dev as well. But the current CLI tool is in Beta, and for me had too many issues to list. Everything from configs that work in the UI not working in the CLI, to the CLI just hanging forever.

I was turning to these tools as a short-cut to writing my own solutions to some things I want to automate, and so when they immediately don’t work I can’t help but feel like I wasted my time.

I also had really high hopes for the LM Studio Python SDK, LM Studio is a pretty great UI, supports MCP servers (external tools) and tons more. So, of course I thought “great I can get that same functionality with the SDK”.. but it’s just not quite there yet. It has tool calls and some basic embedding? but the documentation is lacking at best, and at worst it just feels like pieces are missing.

That’s just two of the twenty tools I have looked at. So that is many hours of my life I will never get back.

LangChain FTW

In my testing LangChain has emerged as a clear winner.

Back in 2024 when I was doing some testing on a DearPyGUI front end to an AI Agent I ended up relying on some parts of LangChain to implement a simple streaming chatbot, the API is different now so it’s not totally relevant but it was something I had used a bit back then.

In 2025 it’s come a long way and the latest API is really nice.

I currently have some demo code working that can Load Documents, Split them up, Convert the split texts into embeddings, Create a persistent Vector Store from the embeddings. I also have the code in place to expose a tool call to an LLM, which it can use to fetch relevant context from the Vector Store for a query, and then make use of the data in it’s response.

I am in the process of improving my demo code to include additional retrievers with re-ranking to provide the LLM with more up-to-date relevant context for it’s responses when needed, among other things.

Much like Continue (in structure) I plan to expose the creation of Vector Stores so that a user can create and reference various context, to provide some additional control over which information is being retrieved (I have a quick and dirty version of this working already).

e.g. There is no reason to search relevant Python documentation when generating a response about JavaScript and so being able to exclude embeddings entirely is appealing.

The End Game

I want to build a fully local multi-modal AI agent using a mixture of RAG, tool calling, LLMs and custom tooling to enable an agent to queue tasks, process and verify the results, use additional models and so on.

I essentially want to be able to queue up various work, and when I come back I want to see progress that has been vetted and verified not just random stuff I then have to vet.

It’s basically part of a programmatic loop I want to setup so I can further develop some ideas I have.

Eventually I want to put together some hardware specifically to run this multi-modal agent, or a few pieces perhaps as I want to provide a full agent interface to explore improving the human AI interaction experience as well as explore some ideas I have around a household AI agent that multiple users would interact with.

I am also interested in exploring custom UI/UX around these tools as I personally feel like most tools today have AI glued to them, instead of being a tool created from the ground-up to utilize AI models.

Help Support Me

If this sounds exciting to you, or if you can provide financial support, compute resources, hardware for my research and work, or just want to chat, then I would love to hear from you. You can reach out to me here, email me, contact me on linkedin, support me on ko-fi or donate on paypal. If you know someone else that might be interested feel free to share this with them.

I will be posting my demo code and eventually my agent code to github so stay tuned. There will be a write up about it here as well.

Thanks for reading!

The Meta

0.1

2025/09/05
  • First Draft

0.2

2025/09/08
  • Linked to some code