DATAFLARE [LAB]
nav.researchnav.resourcesnav.playgroundDevelopersnav.manifesto
Overview
  • Introduction
  • Architecture
  • Authentication
REST API
  • Datasets
RPC Services
  • RPC Integration
SDKs
  • Python SDK
  • TypeScript SDK
  • Go SDK
  • MCP Server
  • VS Code
  • Support
Future
  • Roadmap

Developer Portal

Integrate our high-performance HTTP and RPC endpoints directly into your pipelines. Access chunked legal documents and dense vector embeddings with minimal latency.

REST API

Standard JSON/BSON over HTTP. Best for metadata queries and dataset exploration.

api.dataflare.com/v1

RPC Services

High-throughput binary streaming. Best for consuming massive vector datasets directly.

rpc.dataflare.com:443

Architecture Overview

The Dataflare ecosystem is designed for high-throughput data pipelines and seamless AI agent integration.

AI Agents
Claude / IDEs
LLM Pipelines
Python / Go
Web Apps
Next.js / Browser
Dataflare SDKs (v0.1.6)
MCP Server Proxy
Edge Gateway
Datasets
Enrich
RPC/gRPC
Auth Guard

Our hybrid architecture leverages REST for standard metadata operations and gRPC for high-performance vector streaming. The Model Context Protocol (MCP) layer acts as a bridge, allowing AI models to securely call Dataflare tools without custom boilerplate.

Authentication

All REST API and RPC requests must be authenticated using your securely generated API key.

API Header Auth

Generate your exclusive API key from the Partner Dashboard. You must pass it in the x-api-key header for every HTTP request or gRPC invocation.

HTTP Request
POST /v1/datasets HTTP/1.1
Host: api.dataflare.com
x-api-key: dfk_live_your_api_key_here
Content-Type: application/json

Datasets

Search and retrieve paginated chunks of curated legal & technical Arabic text.

POST/v1/datasets

JSON Body Parameters

  • datasetstring
    Required. E.g., "legal", "medical"
  • limitint32
    Optional. Number of records to return.
  • cursorstring
    Optional. Pagination token from previous request.
  • search_termstring
    Optional. Semantic search query (if vector store enabled).
  • filtersmap<string, string>
    Optional. Filter by specific metadata.

Response (200 OK)

{
  "dataset": "legal",
  "data": [
    {
      "id": "69b5d123c...",
      "content": "بموجب أحكام المادة (٤) من القانون...",
      "metadata": {
        "title": "اللائحة التنفيذية لقانون الشركات...",
        "summary": "قرار إداري بشأن تنظيم اللوائح..."
      }
    }
  ],
  "count": 1,
  "nextCursor": "69b5d122b2fef71e909ce883",
  "latency": "1.737513ms",
  "fields": [
    "category",
    "content",
    "decision",
    "title"
  ]
}

RPC Integration

For LLM training pipelines, REST overhead is unacceptable. Use our compiled RPC services to stream dense embeddings straight into RAPIDS or PyTorch.

Service Reflection

Our RPC endpoints support server reflection. You can use grpcurl to inspect services such as DatasetService and EnrichService directly from your terminal.

grpcurl -H "x-api-key: $DF_API_KEY" rpc.dataflare.com:443 list

dfapi.v1.DatasetService
dfapi.v1.EnrichService

Executing a Query

Once you have inspected the services, you can make direct RPC calls. Here is an example of querying DatasetService.Query using the same JSON payload structure as the REST API.

grpcurl -d '{"dataset": "legal", "limit": 10}' \
  -H "x-api-key: $DF_API_KEY" \
  rpc.dataflare.com:443 dfapi.v1.DatasetService.Query

Native Streaming

For large-scale vector embeddings or multi-gigabyte dataset extractions, we highly recommend utilizing the DatasetService.Stream method via Python's grpcio library to maintain memory efficiency in your PyTorch or RAPIDS training loops.

Python SDK

The official Python client for the Dataflare API. Typed models, connection pooling, resilient retries, and idiomatic paginators — all in one package.

v0.1.6 on PyPI GitHub RepositoryPython ≥ 3.9

Installation

REST Client

pip install dataflare-sdk

With gRPC Support

pip install "dataflare-sdk[grpc]"

Authentication

Set your API key as an environment variable. The SDK will pick it up automatically — no manual configuration needed.

export DF_API_KEY="dfk_live_your_api_key_here"

Streaming Datasets

Use the built-in paginator to iterate over chunked legal documents. Cursor management is handled automatically.

Python
from df import DFClient, AuthenticationError

try:
    with DFClient() as client:
        for doc in client.datasets.stream("legal", search_term="التأمين", limit=100):
            print(f"Doc category: {doc.category} | Title: {doc.title}")
            if doc.source_url:
                client.datasets.download_file(
                    doc.source_url,
                    destination=f"./archives/{doc.id}.pdf"
                )
except AuthenticationError:
    print("Invalid API Key.")

gRPC Client

Drop-in replacement for low-latency environments. Install the grpc extra and swap the client class.

Python (gRPC)
from df import DFGRPCClient

with DFGRPCClient() as client:
    results, next_cursor = client.datasets.query(
        dataset="legal",
        limit=10
    )

Typed Models

Full Pydantic schemas for IDE autocompletion and validation.

Connection Pools

Optimized httpx connection reuse for maximum throughput.

Resilient Retries

Automated tenacity-backed retries for rate limits and network faults.

Idiomatic Paginators

Cursor injection handled automatically via stream().

Safe Downloads

Memory-safe chunked byte streaming for raw file retrieval.

gRPC Support

Optional high-performance gRPC client with server reflection.

TypeScript SDK

Modern, type-safe client for Node.js and Browser environments. Built with Zod for runtime validation and Axios for robust connection management.

v0.1.6 on NPM GitHub RepositoryNode.js ≥ 16

Installation

npm install dataflare-sdk

REST: Streaming & Paginating

Leverage native for await...of loops to consume large datasets without worrying about cursor management.

TypeScript (REST)
import { DFClient } from "dataflare-sdk";

const client = new DFClient();

async function main() {
    for await (const doc of client.datasets.stream("legal", { limit: 100 })) {
        console.log("Processing:", doc.id);
    }
}

gRPC: Performance Mode

Drop-in replacement using Server Reflection. No compiled stubs required.

TypeScript (gRPC)
import { DFGRPCClient } from "dataflare-sdk";

const rpc = new DFGRPCClient();

const [docs, next] = await rpc.datasets.query("legal", { limit: 10 });

Type Safe

Zod-powered runtime validation and static type inference.

Auto Retries

Built-in exponential backoff for network and server errors.

gRPC Support

Server reflection for high-performance direct connections.

Pro Exceptions

Dedicated Error classes for Auth, Rate Limits, and API faults.

Detailed Error Mapping

Status CodeException ClassDescription
401 / 403AuthenticationErrorInvalid or expired API key.
429RateLimitErrorRequest limit exceeded. Retries happen automatically.
5xxAPIErrorInternal server error or transient gateway fault.
ValidationDFErrorZod validation failure or malformed payload.

Go SDK

High-concurrency, idiomatic client for Go backend services. Featuring native channel-based streaming and gRPC with server reflection.

v0.1.6 (Latest) GitHub RepositoryGo ≥ 1.21

Installation

go get github.com/dataflarelab/df-sdk/go

Implementation Examples

Choose between idiomatic channel-based streaming (REST) or high-performance binary protocols (gRPC).

Go (REST Streaming)
import (
    "fmt"
    "github.com/dataflarelab/df-sdk/go"
)

client := dataflare.NewClient(&dataflare.ClientOptions{APIKey: "key"})

docChan, errChan := client.Datasets.Stream("legal", nil)

for doc := range docChan {
    fmt.Println(doc.ID)
}
Go (gRPC Reflection)
import "github.com/dataflarelab/df-sdk/go"

client, _ := dataflare.NewGRPCClient("rpc.dataflare.com:443")
defer client.Close()

// Requests use binary protobuf encoding
err := client.Call(ctx, "GetDataset", req, resp)

MCP Server

The Model Context Protocol (MCP) server for Dataflare. Expose dataset tools directly to AI agents (Claude Desktop, IDEs, etc.) for autonomous data exploration.

v0.1.6 on NPM GitHub RepositoryNode.js ≥ 18

Installation

npm install dataflare-mcp-server

Quick Start: Claude Desktop

Add the following configuration to your claude_desktop_config.json to start using Dataflare tools in Claude.

{
  "mcpServers": {
    "dataflare": {
      "command": "npx",
      "args": ["-y", "dataflare-mcp-server"],
      "env": {
        "DF_API_KEY": "YOUR_API_KEY_HERE"
      }
    }
  }
}

VS Code Extension

Explore, search, and query Dataflare datasets directly from your IDE. Built for speed, security, and world-class AI research.

Marketplace

Global Explorer

Browse curated datasets (Legal, Financial, Medical, News) directly in the VS Code sidebar.

Premium Inspector

Sleek glassmorphic interface to inspect documents and copy fetch snippets instantly.

Quick Start

01

Install the extension from the VS Code Marketplace.

02

Activate by running the following command in the palette (Cmd+Shift+P):

Dataflare: Set API Key
03

Explore datasets in the Activity Bar or right-click any code selection to search.

Need help with integration?

Join our community of sovereign intelligence developers. Get direct support from our engineers and contribute to our open-source SDKs.

Discord Community

Join the conversation and get real-time fixes.

Direct Support

Dedicated assistance for enterprise partners.

Issue Tracker

Report bugs and request new features on GitHub.

Roadmap

Features currently in development for the Dataflare API ecosystem.

✓ Released

Official SDKs

Native Python, TypeScript, and Go clients are now live for seamless pipeline integration.

Coming Soon

Enrichment Service

Direct access to dfapi.v1.EnrichService for programmatic PII redaction, entity extraction, and semantic tagging.

Private Beta

Vector Embeddings API

Generate dense, native-Arabic vector embeddings on-the-fly without needing to load massive models into your own VRAM.

Private Beta

Inference Endpoints

OpenAI-compatible endpoints mapped directly to our proprietary sovereign alignment language models.

footer.copyright
footer.termsfooter.privacyfooter.status