ANOFM Scraper

SCOP

START
->
Solrjob core
v
companies.json
->
Verificap scraped
->
Trimite workflow
->
GitHub Actions
v
OpenCode scrape
->
Solr updated
->
GATA
STARE CURENTA:
1422 / 5171
companii scrape-uite azi

Pas 1: Extrage

Din Solr, toate companiile cu URL anofm

Pas 2: Verifica

Vezi ce ai trimis azi

Pas 3: Trimite

Workflow: opencode_scraper_to_solr.yml

Pas 4: Batch

~27 companii -> commit -> continua

Pas 5: Curata

Sterge run-urile completed

Pas 6: Maine

Noua zi = reextrage + reincepe

Proces Complet

Pas 1: Extragem companiile din Solr

curl -s -u solr:SolrRocks "https://solr.pevitor.ro/solr/job/select?q=url:*anofm*&rows=0&facet=true&facet.field=company&facet.limit=10000"

Pas 2: Verificam ce a fost scrapuit azi

gh api "repos/peviitor-ro/peviitor_opencode_AI_scrapers/actions/runs?per_page=100" -q '.workflow_runs[] | select(.created_at >= "2026-04-11T20:00:00Z") | .id' | wc -l

Pas 3: Trimitem companiile spre scrapuit

Workflow: .github/workflows/opencode_scraper_to_solr.yml din peviitor-ro/peviitor_opencode_AI_scrapers

gh workflow run .github/workflows/opencode_scraper_to_solr.yml -f company='NUME_COMPANIE'

Pas 4: Trimite batch + curata

3a. Trimite ~27 companii
3b. Salveaza scraped_today.json + commit
3c. Sterge 1 pagina de completed runs
3d. Repeat

# Batch + commit
node scrape_remaining.js
git add scraped_today.json && git commit -m "Track X companies" && git push

# Curata 1 pagina completed
gh api ".../actions/runs?per_page=100" -q '[.workflow_runs[] | select(.status == "completed")] | .[] | .id' | while read id; do gh api -X DELETE ".../actions/runs/$id"; done

Pas 5: Curatarea run-urilor completed

# Stergem run-urile completed (nu cele active/queued)
gh api "repos/peviitor-ro/peviitor_opencode_AI_scrapers/actions/runs?per_page=100" -q '[.workflow_runs[] | select(.status == "completed")] | .[] | .id' | while read id; do gh api -X DELETE "repos/peviitor-ro/peviitor_opencode_AI_scrapers/actions/runs/$id"; done

Pas 6: O noua zi = o noua sesiune

Stergem scraped_today.json - cream fisier NOU - reextragem din Solr