Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.jinba.io/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Jinba Vector Search enables semantic search capabilities for knowledge bases using vector embeddings. This tool uses OpenAI’s text-embedding-3-large to vectorize queries and retrieve relevant data from your knowledge base based on similarity scores, providing powerful RAG (Retrieval-Augmented Generation) functionality.

Key Features

  • Semantic Search: Find conceptually similar content, not just exact matches
  • Knowledge Base Integration: Search across uploaded documents and files
  • Similarity Scoring: Filter results by similarity threshold
  • Configurable Results: Control the number of results returned
  • RAG Support: Perfect for building question-answering and information retrieval workflows

Authentication

This tool requires a Jinba API Token to access knowledge bases. Required Configuration:
  • token (string): Your Jinba API Token (stored as a secret)

Input Parameters

ParameterTypeRequiredDefaultDescription
querystringYes-Search query for semantic search
knowledgeBaseIdstringYes-ID of the knowledge base to search
topKnumberNo3Number of top results to return (1-50)
thresholdnumberNo0.3Similarity threshold for filtering results (0-1)

Output Structure

Returns an array of search results with:
FieldTypeDescription
chunkobjectContent chunk with ID, file ID, content, and metadata
scorenumberSimilarity score (0-1, higher is more similar)
fileobjectFile information including filename and content type
querystringThe original search query
totalResultsnumberTotal number of results found

Example: Document Q&A System

- id: search_knowledge_base
  name: search_knowledge_base
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "What is our company's return policy?"
    - name: knowledgeBaseId
      value: "{{secrets.KNOWLEDGE_BASE_ID}}"
    - name: topK
      value: 5
    - name: threshold
      value: 0.4

- id: generate_answer
  name: generate_answer
  tool: OPENAI_INVOKE
  config:
    - name: version
      value: gpt-4
  input:
    - name: prompt
      value: |
        Based on the following search results from our knowledge base, answer the user's question about our return policy.
        
        Search Results:
        {{#each steps.search_knowledge_base.results}}
        **From: {{file.filename}}**
        Content: {{chunk.content}}
        Similarity: {{score}}
        
        {{/each}}
        
        Question: {{steps.search_knowledge_base.query}}
        
        Please provide a comprehensive answer based on the search results above.

Example: Research Assistant

- id: research_query
  name: research_query
  tool: INPUT_TEXT
  input:
    - name: description
      value: "Enter your research question"

- id: search_research_base
  name: search_research_base
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "{{steps.research_query.result}}"
    - name: knowledgeBaseId
      value: "{{secrets.RESEARCH_KB_ID}}"
    - name: topK
      value: 10
    - name: threshold
      value: 0.3

- id: analyze_findings
  name: analyze_findings
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        results = {{steps.search_research_base.results}}
        
        print("=== Research Findings ===")
        print(f"Query: {results['query']}")
        print(f"Total Results: {results['totalResults']}")
        print()
        
        # Group results by source file
        sources = {}
        for result in results['results']:
            filename = result['file']['filename']
            if filename not in sources:
                sources[filename] = []
            sources[filename].append({
                'content': result['chunk']['content'][:200] + "...",
                'score': result['score']
            })
        
        print("=== Sources Found ===")
        for filename, chunks in sources.items():
            print(f"\n📄 {filename}")
            for i, chunk in enumerate(chunks[:3]):  # Top 3 chunks per file
                print(f"  {i+1}. Score: {chunk['score']:.3f}")
                print(f"     {chunk['content']}")
        
        # Calculate average similarity
        avg_score = sum(r['score'] for r in results['results']) / len(results['results'])
        print(f"\n📊 Average Similarity: {avg_score:.3f}")

- id: synthesize_response
  name: synthesize_response
  tool: ANTHROPIC_INVOKE
  config:
    - name: token
      value: "{{secrets.ANTHROPIC_API_KEY}}"
  input:
    - name: prompt
      value: |
        You are a research assistant. Synthesize the following search results into a comprehensive response.
        
        Original Query: {{steps.search_research_base.query}}
        
        Search Results:
        {{#each steps.search_research_base.results}}
        
        **Source: {{file.filename}}** (Similarity: {{score}})
        {{chunk.content}}
        
        ---
        {{/each}}
        
        Please provide:
        1. A direct answer to the question
        2. Key insights from multiple sources
        3. Any conflicting information found
        4. Suggestions for further research

Example: Content Recommendation

- id: get_user_interests
  name: get_user_interests
  tool: INPUT_TEXT
  input:
    - name: description
      value: "Describe topics or content you're interested in"

- id: find_related_content
  name: find_related_content
  tool: JINBA_VECTOR_SEARCH
  config:
    - name: token
      value: "{{secrets.JINBA_API_TOKEN}}"
  input:
    - name: query
      value: "{{steps.get_user_interests.result}}"
    - name: knowledgeBaseId
      value: "{{secrets.CONTENT_KB_ID}}"
    - name: topK
      value: 8
    - name: threshold
      value: 0.25

- id: format_recommendations
  name: format_recommendations
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        results = {{steps.find_related_content.results}}
        
        print("🔍 **Content Recommendations**")
        print(f"Based on: *{results['query']}*\n")
        
        recommendations = []
        for i, result in enumerate(results['results'][:5], 1):
            chunk = result['chunk']
            file_info = result['file']
            score = result['score']
            
            # Extract first sentence or 100 characters
            content_preview = chunk['content'][:100].split('.')[0] + "..."
            
            recommendations.append({
                'rank': i,
                'title': file_info['filename'],
                'preview': content_preview,
                'relevance': f"{score:.1%}",
                'content_type': file_info['contentType']
            })
        
        for rec in recommendations:
            print(f"**{rec['rank']}. {rec['title']}**")
            print(f"   📋 {rec['preview']}")
            print(f"   🎯 Relevance: {rec['relevance']}")
            print(f"   📄 Type: {rec['content_type']}")
            print()
        
        if len(results['results']) == 0:
            print("No relevant content found. Try adjusting your search terms or lowering the threshold.")

Best Practices

Query Optimization

  • Use natural language: Write queries as you would ask a human
  • Be specific: More specific queries often yield better results
  • Include context: Add relevant keywords and context terms

Threshold Selection

  • 0.7-1.0: Very high similarity, exact or near-exact matches
  • 0.4-0.7: High similarity, closely related content
  • 0.2-0.4: Moderate similarity, potentially relevant content
  • 0.0-0.2: Low similarity, may include tangentially related content

Performance Tips

  • Limit topK: Don’t retrieve more results than needed
  • Adjust threshold: Higher thresholds = fewer, more relevant results
  • Use metadata: Leverage chunk metadata for additional filtering

Knowledge Base Setup

Before using vector search, ensure your knowledge base contains relevant documents:
  1. Upload Documents: Add PDFs, text files, or other supported formats
  2. Processing: Allow time for document chunking and vectorization
  3. Test Queries: Start with simple queries to understand your data
  4. Iterate: Refine queries and thresholds based on results

Use Cases

  • Customer Support: Find relevant documentation for user questions
  • Research Assistant: Discover related research papers and documents
  • Content Discovery: Recommend similar articles or resources
  • FAQ Automation: Automatically answer common questions
  • Document Analysis: Find specific information across large document sets
  • Knowledge Management: Quick access to institutional knowledge
  • Legal Research: Search through contracts and legal documents
  • Product Information: Find technical specifications and manuals