import { Badge } from "zudoku/ui/Badge";
import { Button } from "zudoku/ui/Button";

![Beans banner](/beans-banner.png)

# Beans API & MCP

<Badge variant="default">v0.1</Badge><Badge className="badge-live">Live</Badge>

Beans is a news and blog aggregator. It collects data from **7 000+ sources/publishers** every day.

## Key features

- **Vector semantic search** — Natural language queries with configurable accuracy thresholds
- **Comprehensive filtering** — By tags (categories, entities, regions), sources, and time ranges
- **Entity & sentiment enrichment** — Automatic extraction of sentiment, named entities, and metadata optimized for analytics and AI pipelines
- **Trend scoring** — Articles ranked by social engagement metrics for relevance and timeliness
- **Cross-publisher linking** — Related articles mapped across the entire feed

## Authentication

<Button className="btn-with-link" asChild>
  <a href="/settings/api-keys">Get API Key</a>
</Button>

```
Authorization: Bearer YOUR-API-KEY
```

## Base URL

All Beans endpoints live under the `/beans` path prefix.

```bash
BASE_URL="https://api.cafecito.tech"
API_KEY="YOUR-API-KEY"
```

## Core endpoints

### Articles

| Endpoint | Description |
| -------- | ----------- |
| `GET /beans/articles/top-headlines` | Top trending headlines from past 24 hours, ranked by trend score |
| `GET /beans/articles/latest` | Most recently published articles, sorted by publish date (newest first) |
| `GET /beans/articles/trending` | Trending articles ranked by trend score (based on social engagement) |
| `GET /beans/articles/search` | Semantic or tag-based search across all articles in the database |

### Sources

| Endpoint | Description |
| -------- | ----------- |
| `GET /beans/sources` | Retrieves detailed metadata for sources (site name, description, favicon) |

### Tags / Metadata

| Endpoint | Description |
| -------- | ----------- |
| `GET /beans/tags/categories` | Paginated list of unique article categories/topics |
| `GET /beans/tags/entities` | Paginated list of named entities (persons, orgs, products, places) |
| `GET /beans/tags/regions` | Paginated list of geographic regions mentioned in articles |

:::info{title="Pagination"}
All metadata endpoints support `offset` (default 0) and `limit` (default 16, max 128).
:::

## Query Parameters

### Articles

| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `q` | string (3–512) | Optional semantic search query (natural language, triggers vector embedding) |
| `acc` | number (0–1) | Embedding accuracy/similarity threshold — higher = stricter match (default `0.75`) |
| `content_type` | string | Content type filter: `news`, `blog`, `post`, `comment`, etc. |
| `tags` | string[] | Case/whitespace-insensitive filter across categories, regions, entities combined (recommended). E.g., `AI`, `ai`, `#ai` are equivalent. AND combination. |
| `categories` | string[] | Precise category topic filters — inclusive OR, case/whitespace-sensitive |
| `regions` | string[] | Precise geographic region filters — inclusive OR, case/whitespace-sensitive |
| `entities` | string[] | Precise named entity filters — inclusive OR, case/whitespace-sensitive |
| `sources` | string[] | Publisher/source ID filters — inclusive OR |
| `from` | date (YYYY-MM-DD) | **Latest/Trending only:** Articles published/trending since this date (defaults to 7 days ago) |
| `full_content` | boolean | Include full article text (default `false`) — large payload |
| `limit` | integer (1–128) | Results per page (default `16`) |
| `offset` | integer | Pagination offset — number of items to skip (default `0`) |

**Response:** `200 OK` → array of article objects.

### Sources

| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `sources` | string[] | **Required.** Source IDs to fetch metadata for (CSV, case-sensitive) |
| `limit` | integer (1–128) | Items per page (default `16`) |
| `offset` | integer | Pagination offset (default `0`) |

**Response:** `200 OK` → array of `Publisher` objects.

### Tags / Metadata

| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `limit` | integer (1–128) | Items per page (default `16`) |
| `offset` | integer | Pagination offset (default `0`) |

**Response:** `200 OK` → array of strings.

:::info
Checkout [API reference](/api/beans) for more details.
:::

## MCP Server
**Server URL**: `https://api.cafecito.tech/beans/mcp`

Beans endpoints are exposed as hosted MCP tools for AI agent integration. See the [MCP Integration guide](/howtos/mcp-howto) for more details.

## Examples

Below are real-world examples. Replace `YOUR-API-KEY` with the key you generated in the [developer portal](/settings/api-keys).

<CodeTabs syncKey="beans-api">
```js
const API_KEY = process.env.CAFECITO_API_KEY;
const BASE_URL = "https://api.cafecito.tech";
```

```python
import os

API_KEY = os.environ["CAFECITO_API_KEY"]
BASE_URL = "https://api.cafecito.tech"
```

```bash title="cURL"
API_KEY="YOUR-API-KEY"
BASE_URL="https://api.cafecito.tech"
```
</CodeTabs>

### 1. Health check — verify service is operational

<CodeTabs syncKey="beans-api">
```js
const res = await fetch(`${BASE_URL}/beans/health`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const status = await res.json();
console.log("Service status:", status);
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/health",
    headers={"Authorization": f"Bearer {API_KEY}"},
    timeout=30,
)
resp.raise_for_status()

print("Service status:", resp.json())
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/health" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 2. Top headlines from the last 24 hours

Get trending headlines ranked by trend score, perfect for building breaking news dashboards.

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams({
  limit: "10",
});

const res = await fetch(`${BASE_URL}/beans/articles/top-headlines?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const headlines = await res.json();
headlines?.forEach((h) => console.log(h.title, "→", h.url));
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/articles/top-headlines",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={"limit": 10},
    timeout=30,
)
resp.raise_for_status()

for headline in resp.json() or []:
    print(headline.get("title"), "→", headline.get("url"))
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/articles/top-headlines?limit=10" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 3. Latest news on market performance & economy

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams({
  q: "market performance and economy",
  content_type: "news",
  limit: "10",
});

const res = await fetch(`${BASE_URL}/beans/articles/latest?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const articles = await res.json();
articles?.forEach((a) => console.log(a.title, "→", a.url));
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/articles/latest",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={
        "q": "market performance and economy",
        "content_type": "news",
        "limit": 10,
    },
    timeout=30,
)
resp.raise_for_status()

for article in resp.json() or []:
    print(article.get("title"), "→", article.get("url"))
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/articles/latest?q=market+performance+and+economy&content_type=news&limit=10" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 4. Trending news on Robotics in Saudi Arabia

Use the trending endpoint to surface content ranked by social engagement signals and trend score.

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams();
params.append("tags", "Robotics");
params.append("tags", "saudi arabia");
params.append("content_type", "news");
params.append("limit", "10");

const res = await fetch(`${BASE_URL}/beans/articles/trending?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});
const articles = await res.json();
articles?.forEach((a) => console.log(a.title, "|", a.regions));
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/articles/trending",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={"tags": ["Robotics", "saudi arabia"], "content_type": "news", "limit": 10},
    timeout=30,
)
resp.raise_for_status()

for article in resp.json() or []:
    print(article.get("title"), "|", article.get("regions"))
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/articles/trending?tags=Robotics&tags=saudi+arabia&content_type=news&limit=10" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 5. Semantic search on archive — find articles about AI safety concerns

Search across all articles using natural language and retrieve full content for RAG/summarization.

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams({
  q: "AI safety risks and concerns",
  acc: "0.8",
  full_content: "true",
  limit: "5",
});

const res = await fetch(`${BASE_URL}/beans/articles/search?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const results = await res.json();
results?.forEach((a) => {
  console.log(`${a.title} [trend_score: ${a.trend_score}]`);
  console.log(`  → ${a.source} (${a.likes} likes, ${a.shares} shares)`);
  console.log(`  URL: ${a.url}\n`);
});
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/articles/search",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={
        "q": "AI safety risks and concerns",
        "acc": 0.8,
        "full_content": True,
        "limit": 5,
    },
    timeout=30,
)
resp.raise_for_status()

for article in resp.json() or []:
    print(f"{article.get('title')} [trend_score: {article.get('trend_score')}]")
    print(f"  → {article.get('source')} ({article.get('likes')} likes, {article.get('shares')} shares)")
    print(f"  URL: {article.get('url')}\n")
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/articles/search?q=AI+safety+risks+and+concerns&acc=0.8&full_content=true&limit=5" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 6. Get sources metadata

Retrieve site names, descriptions, and favicon URLs for a set of publishers.

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams();
params.append("sources", "barchart");
params.append("sources", "reuters");
params.append("sources", "techcrunch");

const res = await fetch(`${BASE_URL}/beans/sources?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const publishers = await res.json();
publishers?.forEach((p) => {
  console.log(`${p.source_site_name}`);
  console.log(`  ID: ${p.source}`);
  console.log(`  URL: ${p.source_base_url}`);
  console.log(`  Description: ${p.source_description}\n`);
});
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/sources",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={"sources": ["barchart", "reuters", "techcrunch"]},
    timeout=30,
)
resp.raise_for_status()

for publisher in resp.json() or []:
    print(f"{publisher.get('source_site_name')}")
    print(f"  ID: {publisher.get('source')}")
    print(f"  URL: {publisher.get('source_base_url')}")
    print(f"  Description: {publisher.get('source_description')}\n")
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/sources?sources=barchart&sources=reuters&sources=techcrunch" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

### 7. List article categories with pagination

Retrieve all unique categories/topics in the database.

<CodeTabs syncKey="beans-api">
```js
const params = new URLSearchParams({
  limit: "50",
  offset: "0",
});

const res = await fetch(`${BASE_URL}/beans/tags/categories?${params}`, {
  headers: { Authorization: `Bearer ${API_KEY}` },
});

if (!res.ok) throw new Error(`HTTP ${res.status}`);
const categories = await res.json();
console.log("Article categories:");
categories?.forEach((cat) => console.log(`  - ${cat}`));
```

```python
import requests

resp = requests.get(
    f"{BASE_URL}/beans/tags/categories",
    headers={"Authorization": f"Bearer {API_KEY}"},
    params={"limit": 50, "offset": 0},
    timeout=30,
)
resp.raise_for_status()

categories = resp.json() or []
print("Article categories:")
for cat in categories:
    print(f"  - {cat}")
```

```bash title="cURL"
curl -s "${BASE_URL}/beans/tags/categories?limit=50&offset=0" \
  -H "Authorization: Bearer ${API_KEY}"
```
</CodeTabs>

---

## Best practices

- **Start with `acc=0.75`**, then tweak up for precision or down for recall.
- **Use `from` parameter** to keep feeds fresh — specify YYYY-MM-DD dates (e.g., `from=2026-03-10`).
- **Use `full_content=true`** sparingly — requests will be slower and payload larger; great for RAG/summarization pipelines.
- **Paginate with `offset` + `limit`** for stable ingestion pipelines and monitoring.
- **Use `tags` parameter** (recommended) for flexible filtering across categories, regions, and entities in one parameter.
- **Use precise filters** (`categories`, `regions`, `entities`) only when you need exact matches (case-sensitive).
- **Combine search (`q`) with filters** — semantic search + tag filters = powerful precision.

## Use cases that aren't boring

- AI assistants and RAG workflows that need fresh context
- Finance dashboards that actually stay current
- Media trend detection — who's talking about what and when
- Analyst workflows that want enrichment-ready JSON, not raw HTML

## Related
- [API Reference](/api/beans)
- [Cafecito](https://cafecito.tech)
- [Pricing](/pricing)
- [Contact](/contact)
