The AI Hackathon 2025 featured a series of knowledge-sharing sessions designed to help participants move from idea to execution using practical tools and frameworks. One of the most hands-on sessions was titled “Implementation Infrastructure Modelling,” where participants learned how to build Retrieval-Augmented Generation (RAG) systems using Python, vector databases, and LangChain.
The session was led by Mohammad Nasim Uddin, the cofounder and CEO of Connective Studio, a generative AI company that has spent the last two years developing AI-powered products and helping clients build tailored generative AI applications. A common challenge he and his team encountered was that Large Language Models (LLMs) typically rely only on public data. This makes it difficult to inject internal or proprietary company knowledge into AI systems. His solution? RAG—a method that allows combining LLMs with internal knowledge bases and this session focused on how to implement it using Python, Pinecone, and LangChain.
Set Up Your Environment
Mr. Nasim began by walking through the environment setup process. Start by creating and activating a virtual environment using
source myenv/bin/activate.
Next, create two files:
- ingest.py, to handle loading data and storing it into the vector database
- search.py, to manage user queries and retrieve relevant responses
Embedding and Storing Knowledge
The ingestion pipeline focuses on text-based files, specifically .txt and .md formats. Since source files are often long, the session introduced LangChain’s RecursiveCharacterTextSplitter to break them into smaller, manageable chunks.
Once the text is chunked, it is passed to the OpenAI Embedding API, which transforms each chunk into a vector representation. These embeddings are then stored in Pinecone, a vector database.
A dedicated function was written to detect file types, load content, split it, and return the processed chunks. The system supports only .txt and .md files for now, returning an error if others are uploaded.
Participants were encouraged to define chunk size and overlap carefully—for example, using 1000 characters per chunk with 200 characters of overlap—to preserve semantic continuity.
Creating the Ingestion Script
The main() function in ingest.py accepts the file path as a command-line argument, validates it, and then passes it to the loading and splitting function.
Environment variables are stored in a .env file, including:
- OPENAI_API_KEY
- PINECONE_API_KEY
- INDEX_NAME
These are loaded using the python-dotenv package.
For embeddings, Nasim recommended the text-embedding-3-small model from OpenAI, which offers a good balance of cost and quality.
To store data, Pinecone.from_documents() is used to create the vector store, and add_documents() adds the embedded content into the database.
A new index, such as “aihackathon,” must be created in Pinecone with a matching embedding dimension (e.g., 1536) and cosine similarity as the distance metric.
Once set up, running ingest.py with a markdown file stores the data as vector arrays. Additional files, like internal company docs or website content, can be embedded the same way. For larger projects, a loop can automate bulk ingestion from a directory.
Building the Search Interface
After ingesting data, the next step is building a retrieval interface in search.py.
Key packages are imported, including os, dotenv, OpenAI, LangChain, and Pinecone. Environment variables are reloaded and validated. The script reinitializes the OpenAI embedding and sets up the gpt-3.5-turbo model with a temperature of 0 to ensure factual outputs.
The existing Pinecone index is reconnected using credentials from the .env file. From there, LangChain’s ConversationalRetrievalChain is used to enable context-aware search with memory.
The number of documents retrieved is capped (e.g., top 3), and a loop is created to allow continuous user interaction. A list is maintained to track conversation history. Each query is passed through the LangChain chain, which handles retrieval and response generation.
Testing the System
Mr. Nasim demonstrated real examples. When asked, “What is the topic of AI Hackathon?”, the system correctly pulled context from the embedded documents to generate an accurate response.
Another query, “Who is Nasim Uddin?”, returned personal details from the markdown content—proof that internal knowledge was successfully integrated into the LLM’s output via RAG.
He also pointed out that if the LLM part is commented out, the system falls back to raw vector search, returning similarity scores (like 64% or 73%) alongside the most relevant text chunks.
Behind the Scenes, Simplified
While RAG implementations might seem complex, LangChain simplifies much of the process. It allows developers to focus on logic and workflows without needing to manually handle vector search or chaining models.
However, Mr. Nasim noted that for production-level applications, teams might prefer custom RAG architectures for greater flexibility and control. LangChain supports both Python and JavaScript, and although it’s convenient, it’s not mandatory—developers can also use manual API calls and custom-built logic to achieve similar results.
Final Thoughts
In closing, Mr. Nasim encouraged participants to explore the possibilities of retrieval-augmented AI applications. Whether building a smart FAQ, a document search assistant, or an internal knowledge bot, the RAG approach can be a powerful foundation.
For hackathons, this framework enables fast prototyping using real internal data, while leaning on LLMs to generate coherent, context-aware responses.
“Wishing you all success in the AI Hackathon,” he said, concluding a practical and empowering session.