Reduced OpenAI RAG costs by 70% by using a pre-check API call

4 months ago 22

		Reduced OpenAI RAG costs by 70% by using a pre-check API call
		1 point by Kong91 20 minutes ago \| hide \| past \| favorite \| 2 comments

		I am using OpenAI's RAG implementation for my product. I tried doing it on my own with Pinecone but could never get it to retrieve relevant info. Anyway, OpenAI is costly, they charge for embeddings and using "file search" which retrieves the relevant chunk after the question is embedded and turned into vectors for similarity search. Not all questions a user asks need to retrieve context (which is costly). SO, I included a pre-step that users a cheaper OpenAI model to determine whether the question asked needs the context or not, if not, the RAG implementation is not touched. This decreased costs by 70%, making the business viable or more lucrative.