Developer Portal
Integrate our high-performance HTTP and RPC endpoints directly into your pipelines. Access chunked legal documents and dense vector embeddings with minimal latency.
REST API
Standard JSON/BSON over HTTP. Best for metadata queries and dataset exploration.
RPC Services
High-throughput binary streaming. Best for consuming massive vector datasets directly.
Architecture Overview
The Dataflare ecosystem is designed for high-throughput data pipelines and seamless AI agent integration.
Our hybrid architecture leverages REST for standard metadata operations and gRPC for high-performance vector streaming. The Model Context Protocol (MCP) layer acts as a bridge, allowing AI models to securely call Dataflare tools without custom boilerplate.
Authentication
All REST API and RPC requests must be authenticated using your securely generated API key.
API Header Auth
Generate your exclusive API key from the Partner Dashboard. You must pass it in the x-api-key header for every HTTP request or gRPC invocation.
Datasets
Search and retrieve paginated chunks of curated legal & technical Arabic text.
JSON Body Parameters
- datasetstringRequired. E.g., "legal", "medical"
- limitint32Optional. Number of records to return.
- cursorstringOptional. Pagination token from previous request.
- search_termstringOptional. Semantic search query (if vector store enabled).
- filtersmap<string, string>Optional. Filter by specific metadata.
Response (200 OK)
{
"dataset": "legal",
"data": [
{
"id": "69b5d123c...",
"content": "بموجب أحكام المادة (٤) من القانون...",
"metadata": {
"title": "اللائحة التنفيذية لقانون الشركات...",
"summary": "قرار إداري بشأن تنظيم اللوائح..."
}
}
],
"count": 1,
"nextCursor": "69b5d122b2fef71e909ce883",
"latency": "1.737513ms",
"fields": [
"category",
"content",
"decision",
"title"
]
}RPC Integration
For LLM training pipelines, REST overhead is unacceptable. Use our compiled RPC services to stream dense embeddings straight into RAPIDS or PyTorch.
Service Reflection
Our RPC endpoints support server reflection. You can use grpcurl to inspect services such as DatasetService and EnrichService directly from your terminal.
dfapi.v1.DatasetService
dfapi.v1.EnrichService
Executing a Query
Once you have inspected the services, you can make direct RPC calls. Here is an example of querying DatasetService.Query using the same JSON payload structure as the REST API.
-H "x-api-key: $DF_API_KEY" \
rpc.dataflare.com:443 dfapi.v1.DatasetService.Query
Native Streaming
For large-scale vector embeddings or multi-gigabyte dataset extractions, we highly recommend utilizing the DatasetService.Stream method via Python's grpcio library to maintain memory efficiency in your PyTorch or RAPIDS training loops.
Python SDK
The official Python client for the Dataflare API. Typed models, connection pooling, resilient retries, and idiomatic paginators — all in one package.
Installation
REST Client
With gRPC Support
Authentication
Set your API key as an environment variable. The SDK will pick it up automatically — no manual configuration needed.
Streaming Datasets
Use the built-in paginator to iterate over chunked legal documents. Cursor management is handled automatically.
from df import DFClient, AuthenticationError try: with DFClient() as client: for doc in client.datasets.stream("legal", search_term="التأمين", limit=100): print(f"Doc category: {doc.category} | Title: {doc.title}") if doc.source_url: client.datasets.download_file( doc.source_url, destination=f"./archives/{doc.id}.pdf" ) except AuthenticationError: print("Invalid API Key.")
gRPC Client
Drop-in replacement for low-latency environments. Install the grpc extra and swap the client class.
from df import DFGRPCClient with DFGRPCClient() as client: results, next_cursor = client.datasets.query( dataset="legal", limit=10 )
Typed Models
Full Pydantic schemas for IDE autocompletion and validation.
Connection Pools
Optimized httpx connection reuse for maximum throughput.
Resilient Retries
Automated tenacity-backed retries for rate limits and network faults.
Idiomatic Paginators
Cursor injection handled automatically via stream().
Safe Downloads
Memory-safe chunked byte streaming for raw file retrieval.
gRPC Support
Optional high-performance gRPC client with server reflection.
TypeScript SDK
Modern, type-safe client for Node.js and Browser environments. Built with Zod for runtime validation and Axios for robust connection management.
Installation
REST: Streaming & Paginating
Leverage native for await...of loops to consume large datasets without worrying about cursor management.
import { DFClient } from "dataflare-sdk"; const client = new DFClient(); async function main() { for await (const doc of client.datasets.stream("legal", { limit: 100 })) { console.log("Processing:", doc.id); } }
gRPC: Performance Mode
Drop-in replacement using Server Reflection. No compiled stubs required.
import { DFGRPCClient } from "dataflare-sdk"; const rpc = new DFGRPCClient(); const [docs, next] = await rpc.datasets.query("legal", { limit: 10 });
Type Safe
Zod-powered runtime validation and static type inference.
Auto Retries
Built-in exponential backoff for network and server errors.
gRPC Support
Server reflection for high-performance direct connections.
Pro Exceptions
Dedicated Error classes for Auth, Rate Limits, and API faults.
Detailed Error Mapping
| Status Code | Exception Class | Description |
|---|---|---|
| 401 / 403 | AuthenticationError | Invalid or expired API key. |
| 429 | RateLimitError | Request limit exceeded. Retries happen automatically. |
| 5xx | APIError | Internal server error or transient gateway fault. |
| Validation | DFError | Zod validation failure or malformed payload. |
Go SDK
High-concurrency, idiomatic client for Go backend services. Featuring native channel-based streaming and gRPC with server reflection.
Installation
Implementation Examples
Choose between idiomatic channel-based streaming (REST) or high-performance binary protocols (gRPC).
import ( "fmt" "github.com/dataflarelab/df-sdk/go" ) client := dataflare.NewClient(&dataflare.ClientOptions{APIKey: "key"}) docChan, errChan := client.Datasets.Stream("legal", nil) for doc := range docChan { fmt.Println(doc.ID) }
import "github.com/dataflarelab/df-sdk/go" client, _ := dataflare.NewGRPCClient("rpc.dataflare.com:443") defer client.Close() // Requests use binary protobuf encoding err := client.Call(ctx, "GetDataset", req, resp)
MCP Server
The Model Context Protocol (MCP) server for Dataflare. Expose dataset tools directly to AI agents (Claude Desktop, IDEs, etc.) for autonomous data exploration.
Installation
Quick Start: Claude Desktop
Add the following configuration to your claude_desktop_config.json to start using Dataflare tools in Claude.
{
"mcpServers": {
"dataflare": {
"command": "npx",
"args": ["-y", "dataflare-mcp-server"],
"env": {
"DF_API_KEY": "YOUR_API_KEY_HERE"
}
}
}
}VS Code Extension
Explore, search, and query Dataflare datasets directly from your IDE. Built for speed, security, and world-class AI research.
Global Explorer
Browse curated datasets (Legal, Financial, Medical, News) directly in the VS Code sidebar.
Premium Inspector
Sleek glassmorphic interface to inspect documents and copy fetch snippets instantly.
Quick Start
Install the extension from the VS Code Marketplace.
Activate by running the following command in the palette (Cmd+Shift+P):
Explore datasets in the Activity Bar or right-click any code selection to search.
Need help with integration?
Join our community of sovereign intelligence developers. Get direct support from our engineers and contribute to our open-source SDKs.
Roadmap
Features currently in development for the Dataflare API ecosystem.
✓ Released
Official SDKs
Native Python, TypeScript, and Go clients are now live for seamless pipeline integration.
Coming Soon
Enrichment Service
Direct access to dfapi.v1.EnrichService for programmatic PII redaction, entity extraction, and semantic tagging.
Private Beta
Vector Embeddings API
Generate dense, native-Arabic vector embeddings on-the-fly without needing to load massive models into your own VRAM.
Private Beta
Inference Endpoints
OpenAI-compatible endpoints mapped directly to our proprietary sovereign alignment language models.