Skip to main content

Overview

Jinba Flow’s Knowledge feature provides powerful vector database capabilities for building RAG (Retrieval-Augmented Generation) systems. You can create knowledge bases, upload documents, perform semantic search, and integrate AI-powered question-answering into your workflows.

What are Knowledge Bases?

Knowledge Bases are vector databases that store documents with automatic chunking and vectorization. They enable semantic search capabilities, allowing you to find information based on meaning rather than exact text matches.

Key Features

  • Document Storage: Upload PDFs, DOCX, text files, and other formats
  • Automatic Processing: Documents are automatically chunked and vectorized
  • Semantic Search: Find relevant information using natural language queries
  • RAG Support: Perfect for building AI-powered question-answering systems
  • Vector Embeddings: Uses OpenAI’s text-embedding-3-large for high-quality embeddings

Creating a Knowledge Base

  1. Navigate to Storage in the workspace sidebar
  2. Click on the Knowledge Bases tab
  3. Click Create Knowledge Base
  4. Enter a name and description for your knowledge base
  5. The knowledge base is created and ready for file uploads

Adding Files to Knowledge Base

Using the UI

  1. Open your knowledge base from the Storage page
  2. Click Upload File or Add File
  3. Select files from your computer or provide a URL
  4. Files are automatically processed:
    • Parsing: Extract text content from documents
    • Chunking: Split documents into manageable chunks (configurable)
    • Embedding: Convert chunks to vector embeddings
    • Indexing: Store vectors for fast similarity search

Using Workflows

You can also add files to knowledge bases programmatically using the JINBA_KNOWLEDGE_BASE_FILE_ADD tool:
- id: add_file_to_kb
  name: add_file_to_kb
  tool: JINBA_KNOWLEDGE_BASE_FILE_ADD
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: file
      value: "https://example.com/document.pdf"
    - name: filename
      value: "document.pdf"
    - name: executionMode
      value: "SYNCHRONOUS"
    - name: chunkerSettings
      value:
        chunkSize: 512
        chunkOverlap: 128
        chunkingIdentifier: "\\n\\n"
File Processing Status:
  • pending: File uploaded, waiting for processing
  • processing: File is being chunked and vectorized
  • completed: File is ready for search
  • failed: Processing encountered an error
Vector search enables semantic search across your knowledge base using natural language queries.

How It Works

  1. Query Vectorization: Your search query is converted to a vector using OpenAI’s text-embedding-3-large
  2. Similarity Search: The system finds chunks with similar vectors
  3. Ranking: Results are ranked by similarity score
  4. Filtering: Results below the threshold are filtered out
- id: search_kb
  name: search_kb
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "How do I configure authentication?"
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: topK
      value: 5
    - name: threshold
      value: 0.3
Parameters:
  • query: Natural language search query
  • knowledgeBaseId: ID of the knowledge base to search
  • topK: Number of results to return (1-50, default: 3)
  • threshold: Similarity threshold (0-1, default: 0.3)

Building RAG (Retrieval-Augmented Generation) Systems

RAG combines retrieval of relevant information from knowledge bases with AI generation to create accurate, context-aware responses.

What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that:
  1. Retrieves relevant information from a knowledge base using semantic search
  2. Augments the AI prompt with retrieved context
  3. Generates responses based on both the query and retrieved context
This approach allows AI models to answer questions using your own documents and data, rather than relying solely on their training data.

Step-by-Step: Building a RAG System

Step 1: Create a Knowledge Base

  1. Go to StorageKnowledge Bases
  2. Create a new knowledge base (e.g., “Company Documentation”)
  3. Note the knowledge base ID for later use

Step 2: Upload Documents

  1. Open your knowledge base
  2. Upload relevant documents (PDFs, DOCX, text files)
  3. Wait for processing to complete (files show “completed” status)

Step 3: Build the RAG Workflow

Create a workflow that combines vector search with AI generation:
- id: user_question
  name: user_question
  tool: INPUT_TEXT
  input:
    - name: description
      value: "Enter your question"

- id: search_knowledge_base
  name: search_knowledge_base
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "{{steps.user_question.result}}"
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: topK
      value: 5
    - name: threshold
      value: 0.3

- id: generate_answer
  name: generate_answer
  tool: ANTHROPIC_INVOKE
  config:
    - name: version
      value: claude-3-5-sonnet-20241022
  input:
    - name: prompt
      value: |
        Based on the following information from our knowledge base, answer the user's question.
        
        Question: {{steps.user_question.result}}
        
        Relevant Information:
        {{#each steps.search_knowledge_base.result.results}}
        **Source: {{file.filename}}** (Relevance: {{score}})
        {{chunk.content}}
        
        ---
        {{/each}}
        
        Please provide a comprehensive and accurate answer based on the information above.
        If the information doesn't contain enough details to answer the question, say so.

Step 4: Configure Secrets

  1. Go to Credentials in your workspace
  2. Add your Jinba API Token as a secret
  3. Store your Knowledge Base ID as a secret (or use it directly in the workflow)

Advanced RAG Patterns

Multi-Step RAG with Refinement

- id: initial_search
  name: initial_search
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "{{steps.user_question.result}}"
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: topK
      value: 10

- id: refine_query
  name: refine_query
  tool: ANTHROPIC_INVOKE
  config:
    - name: version
      value: claude-3-5-sonnet-20241022
  input:
    - name: prompt
      value: |
        Based on the initial search results, generate a more specific search query.
        
        Original Query: {{steps.user_question.result}}
        
        Initial Results:
        {{#each steps.initial_search.result.results}}
        - {{file.filename}}: {{chunk.content}}
        {{/each}}
        
        Generate a refined search query that will find more specific information.

- id: refined_search
  name: refined_search
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "{{steps.refine_query.result.text}}"
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: topK
      value: 5
    - name: threshold
      value: 0.4

- id: final_answer
  name: final_answer
  tool: ANTHROPIC_INVOKE
  config:
    - name: version
      value: claude-3-5-sonnet-20241022
  input:
    - name: prompt
      value: |
        Answer the user's question using the refined search results.
        
        Question: {{steps.user_question.result}}
        
        Refined Search Results:
        {{#each steps.refined_search.result.results}}
        **{{file.filename}}** (Score: {{score}})
        {{chunk.content}}
        {{/each}}
        
        Provide a detailed, accurate answer.

Managing Knowledge Base Files

Updating Files

You can update existing files in a knowledge base using the JINBA_KNOWLEDGE_BASE_UPDATE tool:
- id: update_kb_file
  name: update_kb_file
  tool: JINBA_KNOWLEDGE_BASE_UPDATE
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: knowledgeBaseFileId
      value: "{{secrets.FILE_ID}}"
    - name: fileUrl
      value: "https://example.com/updated-document.pdf"
    - name: updateType
      value: "FULL_REFRESH"
    - name: executionMode
      value: "SYNCHRONOUS"
Update Types:
  • FULL_REFRESH: Replaces the entire file with new content
  • Other update modes may be available depending on your configuration

Chunking Configuration

When adding files to knowledge bases, you can configure how documents are chunked:
  • chunkSize: Size of each chunk in tokens (default: 512, max: 8192)
  • chunkOverlap: Overlap between chunks in tokens (default: 128, max: 2048)
  • chunkingIdentifier: String used to identify chunk boundaries (default: “\n\n”)
Best Practices:
  • Smaller chunks (256-512): Better for precise information retrieval
  • Larger chunks (1024-2048): Better for maintaining context
  • Overlap: Helps maintain context across chunk boundaries

RAG Best Practices

  1. Quality Knowledge Base: Upload high-quality, relevant documents
  2. Appropriate Chunking: Use appropriate chunk sizes (typically 512-1024 tokens)
  3. Threshold Tuning: Adjust similarity thresholds based on your use case
  4. TopK Selection: Retrieve enough context (typically 3-10 chunks) for comprehensive answers
  5. Prompt Engineering: Craft prompts that clearly instruct the AI to use retrieved context
  6. Source Attribution: Always cite sources for transparency and verification
  7. Error Handling: Handle cases where no relevant information is found

Use Cases for RAG

  • Customer Support: Answer questions using product documentation
  • Internal Knowledge: Access company policies and procedures
  • Research Assistant: Search through research papers and documents
  • Legal Document Q&A: Answer questions about contracts and legal documents
  • Technical Documentation: Help developers find information in technical docs
  • Product Information: Answer questions about product specifications