Documentation Index
Fetch the complete documentation index at: https://docs.jinba.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Jinba Vector Search enables semantic search capabilities for knowledge bases using vector embeddings. This tool uses OpenAI’s text-embedding-3-large to vectorize queries and retrieve relevant data from your knowledge base based on similarity scores, providing powerful RAG (Retrieval-Augmented Generation) functionality.
Key Features
- Semantic Search: Find conceptually similar content, not just exact matches
- Knowledge Base Integration: Search across uploaded documents and files
- Similarity Scoring: Filter results by similarity threshold
- Configurable Results: Control the number of results returned
- RAG Support: Perfect for building question-answering and information retrieval workflows
Authentication
This tool requires a Jinba API Token to access knowledge bases.
Required Configuration:
token (string): Your Jinba API Token (stored as a secret)
| Parameter | Type | Required | Default | Description |
|---|
query | string | Yes | - | Search query for semantic search |
knowledgeBaseId | string | Yes | - | ID of the knowledge base to search |
topK | number | No | 3 | Number of top results to return (1-50) |
threshold | number | No | 0.3 | Similarity threshold for filtering results (0-1) |
Output Structure
Returns an array of search results with:
| Field | Type | Description |
|---|
chunk | object | Content chunk with ID, file ID, content, and metadata |
score | number | Similarity score (0-1, higher is more similar) |
file | object | File information including filename and content type |
query | string | The original search query |
totalResults | number | Total number of results found |
Example: Document Q&A System
- id: search_knowledge_base
name: search_knowledge_base
tool: JINBA_VECTOR_SEARCH
config:
- name: token
value: "{{secrets.JINBA_API_TOKEN}}"
input:
- name: query
value: "What is our company's return policy?"
- name: knowledgeBaseId
value: "{{secrets.KNOWLEDGE_BASE_ID}}"
- name: topK
value: 5
- name: threshold
value: 0.4
- id: generate_answer
name: generate_answer
tool: OPENAI_INVOKE
config:
- name: version
value: gpt-4
input:
- name: prompt
value: |
Based on the following search results from our knowledge base, answer the user's question about our return policy.
Search Results:
{{#each steps.search_knowledge_base.results}}
**From: {{file.filename}}**
Content: {{chunk.content}}
Similarity: {{score}}
{{/each}}
Question: {{steps.search_knowledge_base.query}}
Please provide a comprehensive answer based on the search results above.
Example: Research Assistant
- id: research_query
name: research_query
tool: INPUT_TEXT
input:
- name: description
value: "Enter your research question"
- id: search_research_base
name: search_research_base
tool: JINBA_VECTOR_SEARCH
config:
- name: token
value: "{{secrets.JINBA_API_TOKEN}}"
input:
- name: query
value: "{{steps.research_query.result}}"
- name: knowledgeBaseId
value: "{{secrets.RESEARCH_KB_ID}}"
- name: topK
value: 10
- name: threshold
value: 0.3
- id: analyze_findings
name: analyze_findings
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
results = {{steps.search_research_base.results}}
print("=== Research Findings ===")
print(f"Query: {results['query']}")
print(f"Total Results: {results['totalResults']}")
print()
# Group results by source file
sources = {}
for result in results['results']:
filename = result['file']['filename']
if filename not in sources:
sources[filename] = []
sources[filename].append({
'content': result['chunk']['content'][:200] + "...",
'score': result['score']
})
print("=== Sources Found ===")
for filename, chunks in sources.items():
print(f"\n📄 {filename}")
for i, chunk in enumerate(chunks[:3]): # Top 3 chunks per file
print(f" {i+1}. Score: {chunk['score']:.3f}")
print(f" {chunk['content']}")
# Calculate average similarity
avg_score = sum(r['score'] for r in results['results']) / len(results['results'])
print(f"\n📊 Average Similarity: {avg_score:.3f}")
- id: synthesize_response
name: synthesize_response
tool: ANTHROPIC_INVOKE
config:
- name: token
value: "{{secrets.ANTHROPIC_API_KEY}}"
input:
- name: prompt
value: |
You are a research assistant. Synthesize the following search results into a comprehensive response.
Original Query: {{steps.search_research_base.query}}
Search Results:
{{#each steps.search_research_base.results}}
**Source: {{file.filename}}** (Similarity: {{score}})
{{chunk.content}}
---
{{/each}}
Please provide:
1. A direct answer to the question
2. Key insights from multiple sources
3. Any conflicting information found
4. Suggestions for further research
Example: Content Recommendation
- id: get_user_interests
name: get_user_interests
tool: INPUT_TEXT
input:
- name: description
value: "Describe topics or content you're interested in"
- id: find_related_content
name: find_related_content
tool: JINBA_VECTOR_SEARCH
config:
- name: token
value: "{{secrets.JINBA_API_TOKEN}}"
input:
- name: query
value: "{{steps.get_user_interests.result}}"
- name: knowledgeBaseId
value: "{{secrets.CONTENT_KB_ID}}"
- name: topK
value: 8
- name: threshold
value: 0.25
- id: format_recommendations
name: format_recommendations
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
results = {{steps.find_related_content.results}}
print("🔍 **Content Recommendations**")
print(f"Based on: *{results['query']}*\n")
recommendations = []
for i, result in enumerate(results['results'][:5], 1):
chunk = result['chunk']
file_info = result['file']
score = result['score']
# Extract first sentence or 100 characters
content_preview = chunk['content'][:100].split('.')[0] + "..."
recommendations.append({
'rank': i,
'title': file_info['filename'],
'preview': content_preview,
'relevance': f"{score:.1%}",
'content_type': file_info['contentType']
})
for rec in recommendations:
print(f"**{rec['rank']}. {rec['title']}**")
print(f" 📋 {rec['preview']}")
print(f" 🎯 Relevance: {rec['relevance']}")
print(f" 📄 Type: {rec['content_type']}")
print()
if len(results['results']) == 0:
print("No relevant content found. Try adjusting your search terms or lowering the threshold.")
Best Practices
Query Optimization
- Use natural language: Write queries as you would ask a human
- Be specific: More specific queries often yield better results
- Include context: Add relevant keywords and context terms
Threshold Selection
- 0.7-1.0: Very high similarity, exact or near-exact matches
- 0.4-0.7: High similarity, closely related content
- 0.2-0.4: Moderate similarity, potentially relevant content
- 0.0-0.2: Low similarity, may include tangentially related content
- Limit topK: Don’t retrieve more results than needed
- Adjust threshold: Higher thresholds = fewer, more relevant results
- Use metadata: Leverage chunk metadata for additional filtering
Knowledge Base Setup
Before using vector search, ensure your knowledge base contains relevant documents:
- Upload Documents: Add PDFs, text files, or other supported formats
- Processing: Allow time for document chunking and vectorization
- Test Queries: Start with simple queries to understand your data
- Iterate: Refine queries and thresholds based on results
Use Cases
- Customer Support: Find relevant documentation for user questions
- Research Assistant: Discover related research papers and documents
- Content Discovery: Recommend similar articles or resources
- FAQ Automation: Automatically answer common questions
- Document Analysis: Find specific information across large document sets
- Knowledge Management: Quick access to institutional knowledge
- Legal Research: Search through contracts and legal documents
- Product Information: Find technical specifications and manuals