Documentation Index
Fetch the complete documentation index at: https://docs.jinba.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Advanced Google Sheets tools provide enhanced integration capabilities with Google Sheets, including the ability to retrieve complete sheet data. This complements the existing Google Sheets and Google Spreadsheet tools with additional functionality for comprehensive data access.
Key Features
GOOGLE_SHEETS_GET_SHEET
- Retrieve complete sheet data from Google Sheets
- Access to all cells and formatting information
- Support for multiple sheets within a workbook
- Efficient data retrieval for large datasets
Authentication
Google Sheets integration requires Google API credentials and appropriate permissions.
Required setup:
- Google Cloud Console project with Sheets API enabled
- Service account credentials or OAuth 2.0 client
- Appropriate sharing permissions on target sheets
Note: Treat API credentials as sensitive information and never commit them to public repositories.
Example: Complete Sheet Data Retrieval
- id: get_full_sheet_data
name: get_full_sheet_data
tool: GOOGLE_SHEETS_GET_SHEET
config:
- name: credentials
value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
input:
- name: sheet_id
value: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
- name: sheet_name
value: "Sheet1" # Optional: defaults to first sheet
- name: include_formatting
value: true
- name: include_metadata
value: true
- id: analyze_sheet_structure
name: analyze_sheet_structure
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
# Analyze retrieved sheet data
sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
print("Google Sheet Analysis")
print("=" * 25)
# Basic information
print(f"Sheet Name: {sheet_data.get('sheet_name', 'Unknown')}")
print(f"Sheet ID: {sheet_data.get('sheet_id', 'Unknown')}")
# Data dimensions
rows = sheet_data.get('rows', [])
if rows:
max_cols = max(len(row) for row in rows)
print(f"Dimensions: {len(rows)} rows × {max_cols} columns")
# Data preview
print("\nData Preview (first 5 rows):")
for i, row in enumerate(rows[:5], 1):
print(f"Row {i}: {row[:5]}...") # Show first 5 columns
else:
print("No data found in sheet")
# Metadata
if 'metadata' in sheet_data:
metadata = sheet_data['metadata']
print(f"\nMetadata:")
print(f" Last Modified: {metadata.get('modified_time', 'Unknown')}")
print(f" Owner: {metadata.get('owner', 'Unknown')}")
- id: convert_to_structured_format
name: convert_to_structured_format
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
rows = sheet_data.get('rows', [])
if not rows:
print("No data to convert")
else:
# Assume first row contains headers
headers = rows[0] if rows else []
data_rows = rows[1:] if len(rows) > 1 else []
# Convert to array of objects
structured_data = []
for row in data_rows:
row_obj = {}
for i, header in enumerate(headers):
value = row[i] if i < len(row) else ""
row_obj[header] = value
structured_data.append(row_obj)
# Output structured data
result = {
"headers": headers,
"data": structured_data,
"record_count": len(structured_data)
}
print(json.dumps(result, indent=2))
- id: export_structured_data
name: export_structured_data
tool: OUTPUT_FILE
input:
- name: content
value: "{{steps.convert_to_structured_format.result.stdout}}"
- name: filename
value: "google_sheets_export_{{date | format('YYYY-MM-DD')}}.json"
- name: fileType
value: "json"
Example: Multi-Sheet Workbook Processing
- id: get_workbook_info
name: get_workbook_info
tool: GOOGLE_SHEETS_GET_SHEET
config:
- name: credentials
value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
input:
- name: sheet_id
value: "{{input.workbook_id}}"
- name: get_sheet_list
value: true # Get list of all sheets in workbook
- id: process_all_sheets
name: process_all_sheets
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
# Get list of sheets from workbook
workbook_info = json.loads('''{{steps.get_workbook_info.result.workbook_info}}''')
sheets = workbook_info.get('sheets', [])
print("Multi-Sheet Processing Plan")
print("=" * 30)
print(f"Workbook ID: {workbook_info.get('workbook_id', 'Unknown')}")
print(f"Total Sheets: {len(sheets)}")
print("\nSheets to process:")
processing_plan = []
for i, sheet in enumerate(sheets):
sheet_name = sheet.get('name', f'Sheet{i+1}')
sheet_info = {
"sheet_name": sheet_name,
"step_id": f"get_sheet_{i+1}",
"processing_order": i+1
}
processing_plan.append(sheet_info)
print(f" {i+1}. {sheet_name}")
print(f"\nProcessing plan created for {len(processing_plan)} sheets")
print(json.dumps(processing_plan, indent=2))
- id: get_sales_sheet
name: get_sales_sheet
tool: GOOGLE_SHEETS_GET_SHEET
config:
- name: credentials
value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
input:
- name: sheet_id
value: "{{input.workbook_id}}"
- name: sheet_name
value: "Sales Data"
- name: include_formatting
value: true
- id: get_inventory_sheet
name: get_inventory_sheet
tool: GOOGLE_SHEETS_GET_SHEET
config:
- name: credentials
value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
input:
- name: sheet_id
value: "{{input.workbook_id}}"
- name: sheet_name
value: "Inventory"
- name: include_formatting
value: true
- id: consolidate_workbook_data
name: consolidate_workbook_data
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
from datetime import datetime
# Consolidate data from multiple sheets
sales_data = json.loads('''{{steps.get_sales_sheet.result.sheet_data}}''')
inventory_data = json.loads('''{{steps.get_inventory_sheet.result.sheet_data}}''')
consolidated_report = {
"report_date": datetime.now().isoformat(),
"workbook_id": "{{input.workbook_id}}",
"sheets_processed": [],
"summary_statistics": {},
"data_quality_checks": []
}
# Process sales data
if sales_data.get('rows'):
sales_rows = len(sales_data['rows']) - 1 # Exclude header
consolidated_report["sheets_processed"].append({
"sheet_name": "Sales Data",
"row_count": sales_rows,
"status": "processed"
})
# Process inventory data
if inventory_data.get('rows'):
inventory_rows = len(inventory_data['rows']) - 1 # Exclude header
consolidated_report["sheets_processed"].append({
"sheet_name": "Inventory",
"row_count": inventory_rows,
"status": "processed"
})
# Calculate summary statistics
total_rows = sum(sheet["row_count"] for sheet in consolidated_report["sheets_processed"])
consolidated_report["summary_statistics"] = {
"total_data_rows": total_rows,
"sheets_processed": len(consolidated_report["sheets_processed"]),
"processing_time": "{{date | format('HH:mm:ss')}}"
}
print("Workbook Consolidation Complete")
print("=" * 35)
print(json.dumps(consolidated_report, indent=2))
- id: save_consolidated_report
name: save_consolidated_report
tool: OUTPUT_FILE
input:
- name: content
value: "{{steps.consolidate_workbook_data.result.stdout}}"
- name: filename
value: "workbook_consolidation_report.json"
- name: fileType
value: "json"
Example: Data Quality Analysis
- id: retrieve_data_for_analysis
name: retrieve_data_for_analysis
tool: GOOGLE_SHEETS_GET_SHEET
config:
- name: credentials
value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
input:
- name: sheet_id
value: "{{input.analysis_sheet_id}}"
- name: sheet_name
value: "{{input.sheet_name}}"
- name: include_formatting
value: true
- name: include_metadata
value: true
- id: perform_data_quality_analysis
name: perform_data_quality_analysis
tool: PYTHON_SANDBOX_RUN
input:
- name: code
value: |
import json
import re
from collections import Counter
# Analyze data quality
sheet_data = json.loads('''{{steps.retrieve_data_for_analysis.result.sheet_data}}''')
rows = sheet_data.get('rows', [])
if not rows:
print("No data found for analysis")
else:
headers = rows[0] if rows else []
data_rows = rows[1:] if len(rows) > 1 else []
print("Data Quality Analysis Report")
print("=" * 35)
print(f"Dataset: {sheet_data.get('sheet_name', 'Unknown')}")
print(f"Total Records: {len(data_rows)}")
print(f"Total Columns: {len(headers)}")
print()
# Column-by-column analysis
quality_report = {
"overall_stats": {
"total_records": len(data_rows),
"total_columns": len(headers),
"completeness_score": 0
},
"column_analysis": [],
"data_issues": []
}
total_cells = 0
empty_cells = 0
for col_idx, header in enumerate(headers):
col_values = [row[col_idx] if col_idx < len(row) else "" for row in data_rows]
# Calculate column statistics
non_empty_values = [v for v in col_values if v.strip()]
empty_count = len(col_values) - len(non_empty_values)
completeness = (len(non_empty_values) / len(col_values)) * 100 if col_values else 0
# Data type detection
numeric_count = sum(1 for v in non_empty_values if str(v).replace('.', '').replace('-', '').isdigit())
date_count = sum(1 for v in non_empty_values if re.match(r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}', str(v)))
email_count = sum(1 for v in non_empty_values if '@' in str(v) and '.' in str(v))
likely_type = "text"
if numeric_count / len(non_empty_values) > 0.8 if non_empty_values else False:
likely_type = "numeric"
elif date_count / len(non_empty_values) > 0.5 if non_empty_values else False:
likely_type = "date"
elif email_count / len(non_empty_values) > 0.5 if non_empty_values else False:
likely_type = "email"
col_analysis = {
"column_name": header,
"completeness_percent": round(completeness, 2),
"empty_cells": empty_count,
"likely_data_type": likely_type,
"unique_values": len(set(non_empty_values)),
"sample_values": non_empty_values[:3]
}
quality_report["column_analysis"].append(col_analysis)
total_cells += len(col_values)
empty_cells += empty_count
# Identify potential issues
if completeness < 90:
quality_report["data_issues"].append(f"Column '{header}' has low completeness ({completeness:.1f}%)")
if len(set(non_empty_values)) == 1 and non_empty_values:
quality_report["data_issues"].append(f"Column '{header}' has only one unique value")
# Overall completeness score
overall_completeness = ((total_cells - empty_cells) / total_cells) * 100 if total_cells > 0 else 0
quality_report["overall_stats"]["completeness_score"] = round(overall_completeness, 2)
print(f"Overall Data Completeness: {overall_completeness:.1f}%")
print()
# Display column analysis
print("Column Analysis:")
for col in quality_report["column_analysis"]:
print(f" 📊 {col['column_name']}")
print(f" Completeness: {col['completeness_percent']}%")
print(f" Type: {col['likely_data_type']}")
print(f" Unique Values: {col['unique_values']}")
print()
# Display issues
if quality_report["data_issues"]:
print("🚨 Data Quality Issues:")
for issue in quality_report["data_issues"]:
print(f" - {issue}")
else:
print("✅ No major data quality issues detected")
print("\n" + json.dumps(quality_report, indent=2))
- id: create_quality_recommendations
name: create_quality_recommendations
tool: OPENAI_INVOKE
config:
- name: version
value: gpt-4
input:
- name: prompt
value: |
Based on this data quality analysis report, please provide specific recommendations for improving data quality:
{{steps.perform_data_quality_analysis.result.stdout}}
Please provide:
1. Priority issues to address first
2. Specific steps to improve data completeness
3. Data validation rules to implement
4. Best practices for ongoing data maintenance
5. Suggested data cleaning workflows
- id: save_quality_analysis_report
name: save_quality_analysis_report
tool: OUTPUT_FILE
input:
- name: content
value: |
Google Sheets Data Quality Analysis Report
========================================
Generated: {{date | format('YYYY-MM-DD HH:mm:ss')}}
## Analysis Results
{{steps.perform_data_quality_analysis.result.stdout}}
## Recommendations
{{steps.create_quality_recommendations.result.content}}
## Next Steps
1. Address high-priority data quality issues
2. Implement recommended data validation rules
3. Set up regular data quality monitoring
4. Train users on data entry best practices
5. Consider automated data cleaning workflows
- name: filename
value: "data_quality_analysis_{{date | format('YYYY-MM-DD')}}.txt"
- name: fileType
value: "txt"
Sheet Retrieval Options
Basic Retrieval
- sheet_id: Google Sheets document ID
- sheet_name: Specific sheet tab name (optional)
- range: Specific cell range (optional)
Advanced Options
- include_formatting: Include cell formatting information
- include_metadata: Include sheet metadata (owner, modified date, etc.)
- get_sheet_list: Retrieve list of all sheets in workbook
- preserve_empty_cells: Include empty cells in output
Standard Response
{
"sheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
"sheet_name": "Sheet1",
"rows": [
["Header 1", "Header 2", "Header 3"],
["Value 1", "Value 2", "Value 3"],
["Value 4", "Value 5", "Value 6"]
],
"metadata": {
"modified_time": "2024-01-15T10:30:00Z",
"owner": "user@example.com"
}
}
{
"sheet_data": {
"rows": [...],
"formatting": {
"cell_formats": {
"A1": {"bold": true, "background_color": "#ffff00"},
"B1": {"italic": true, "text_color": "#ff0000"}
}
}
}
}
Use Cases
- Data Analysis: Retrieve complete datasets for analysis
- Reporting: Generate reports from Google Sheets data
- Data Migration: Export data from Google Sheets to other systems
- Quality Assurance: Analyze data quality and completeness
- Backup and Archive: Create backups of important spreadsheets
- Integration: Connect Google Sheets data to other workflow tools
- Dashboard Creation: Pull data for dashboard and visualization tools
- Compliance Reporting: Extract data for regulatory reporting
Best Practices
- Retrieve only necessary data ranges
- Use appropriate inclusion options (formatting, metadata)
- Consider pagination for very large sheets
- Cache frequently accessed data
Data Handling
- Always validate retrieved data structure
- Handle empty cells and missing data gracefully
- Implement error handling for access permission issues
- Consider data privacy and security requirements
Integration Strategy
- Use structured data formats for downstream processing
- Implement data quality checks after retrieval
- Log data retrieval operations for audit trails
- Plan for rate limiting and quota management