시스템 다이어그램¶
SeMu-GPT 2026의 컴포넌트 구성, 데이터 흐름, 인프라 토폴로지, 배포 파이프라인을 한 페이지에 모아둔다. 용어·컴포넌트 책임은 아키텍처 개요 페이지 참조.
1. 전체 컴포넌트 다이어그램¶
graph TD
subgraph Users["사용자"]
U[웹 브라우저
데스크톱·모바일]
end
subgraph EdgeDev["dev 엣지"]
CFW[Cloudflare Workers
semu-chat-dev]
CFT[Cloudflare Tunnel
semu-gpt-dev]
end
subgraph EdgeProd["prod 엣지 (soft launch)"]
ALB[AWS ALB
semu-gpt-alb]
CFLEG[CloudFront
레거시 apex/www]
end
subgraph FE["Frontend (Next.js 16)"]
APP[App Router
채팅·결제·관리자]
SSE[SSE Client
RAG 스트림 수신]
end
subgraph BE["Backend (Spring Boot 3.1)"]
CTRL[Controllers
Auth·Account·Conversation·
Toss·Admin·Reference]
SVC[Services
StreamingConversationService
StreamingRagProcessor
HydeService]
REPO[Repositories
JPA + ES]
SEC[Security
JWT TokenProvider/Parser]
end
subgraph Stores["데이터 저장소"]
MY[(MySQL
account·membership·payment·
conversation·coupon)]
ES[(Elasticsearch 8.17
tax-law·tax-precedent·
tax-tribunal·tax-counsel·tax-threeway)]
RD[(Redis 7
캐시)]
end
subgraph Pipeline["Data Pipeline (Python CLI)"]
COL[Collectors
국세청·법제처·찾아줘세무사·심판원]
IDX[Indexers
ES bulk + 임베딩]
end
subgraph Ext["외부 서비스"]
OAI[OpenAI
GPT-5 / Embedding]
LF[Langfuse]
TOSS[Toss Payments]
SENS[Naver Cloud SENS
SMS]
SEN[Sentry]
end
U --> CFW
U --> ALB
U --> CFLEG
CFW --> APP
ALB --> APP
APP <--> SSE
APP -- HTTP REST --> CTRL
SSE -- SSE --> CTRL
CFT -- :8080 --> CTRL
ALB -- :8080 --> CTRL
CTRL --> SEC
CTRL --> SVC
SVC --> REPO
REPO --> MY
REPO --> ES
SVC --> RD
SVC -- 프롬프트 fetch --> LF
SVC -- chat / embedding --> OAI
CTRL -- confirm/webhook --> TOSS
SVC -- SMS --> SENS
APP -- 에러 --> SEN
COL -- JSONL --> IDX
IDX -- bulk index --> ES
IDX -- embedding --> OAI
2. RAG 질의 응답 흐름 (SSE 스트리밍)¶
POST /conversations/stream 또는 POST /conversations/{id}/turns/stream 호출 시의 end-to-end 흐름.
참고자료 검색·필터링·보충 로직은 프로젝트 루트 CLAUDE.md의 "RAG 참고자료 아키텍처" 섹션 참조.
sequenceDiagram
autonumber
participant U as 사용자
participant FE as 프론트엔드
participant API as Backend (StreamingConversationController)
participant SVC as StreamingConversationService
participant RAG as StreamingRagProcessor
participant LF as Langfuse
participant LLM as OpenAI
participant ES as Elasticsearch
participant DB as MySQL
U->>FE: 질문 입력
FE->>API: POST /conversations/stream
Authorization: Bearer
API->>SVC: streamAnswer(question, sessionId?)
SVC->>DB: ConversationSession 생성/조회
SVC->>LF: getPromptWithConfig(hyde-generator)
SVC->>LLM: HyDE 가상 답변 생성
LLM-->>SVC: 가상 답변 텍스트
SVC->>LLM: text-embedding-3-large
LLM-->>SVC: vector(3072)
SVC->>RAG: search(question, vector, taxCategory)
RAG->>ES: BM25 + kNN (RRF 합산)
tax-law / precedent / tribunal / counsel
ES-->>RAG: hits per index
RAG->>RAG: dedup → category backfill →
tier sort → 주변법 제거 → max 5/type
RAG-->>SVC: ranked references
SVC->>LF: getPromptWithConfig(rag-final-answer)
SVC->>LLM: ChatCompletion (stream=true)
loop 토큰 스트림
LLM-->>SVC: token chunk
SVC-->>FE: SSE: data chunk
FE-->>U: 점진 렌더링
end
LLM-->>SVC: 답변 완료 + id1,id2
SVC->>ES: supplementThreewayReferences(article_keys)
SVC->>ES: supplementReferences(답변 텍스트 추출)
SVC->>SVC: reorderByCitations()
SVC-->>FE: SSE: citation_update
SVC->>DB: ConversationTurn persist (질문·답변·refs)
SVC-->>FE: SSE: done
3. 결제 흐름 (Toss V2 — 단건 결제 + Webhook)¶
상세 비즈니스 룰은 결제 (Toss V2) 페이지 참조.
sequenceDiagram
autonumber
participant U as 사용자
participant FE as 프론트엔드
participant TW as Toss Widget (브라우저 SDK)
participant API as Backend
participant TOSS as Toss Server
participant DB as MySQL
U->>FE: 멤버십 플랜 선택
FE->>TW: tossPayments.requestPayment(...)
NEXT_PUBLIC_TOSS_CLIENT_KEY
TW->>U: 결제창 (카드/계좌/간편)
U->>TW: 결제 수단 입력
TW->>FE: success → /payment/success?paymentKey&orderId&amount
FE->>API: POST /payments/confirm
{ paymentKey, orderId, amount }
API->>TOSS: POST /v1/payments/confirm
Basic auth (TOSS_SECRET_KEY)
TOSS-->>API: { status: "DONE", method, totalAmount, ... }
API->>DB: Payment.save(SUCCESS)
API->>DB: Membership.activate(plan, period)
API-->>FE: { paymentId, membership }
FE-->>U: 결제 완료 화면
Note over TOSS,API: 비동기 webhook (재시도 포함)
TOSS->>API: POST /webhooks/tosspayments
(TossWebhookController)
API->>API: 시그니처 검증
API->>DB: Payment 상태 동기화
(취소·부분취소·실패 처리)
API-->>TOSS: 200 OK
4. 데이터 파이프라인 인덱싱 흐름¶
오프라인 CLI 작업. 자세한 단계는 데이터 파이프라인 참조.
sequenceDiagram
autonumber
participant DEV as 개발자 (로컬)
participant COL as Collector (Python)
participant SRC as 외부 사이트
(국세청·법제처 등)
participant FS as 로컬 파일시스템
(data/{source}/*.jsonl)
participant IDX as Indexer (Python)
participant OAI as OpenAI Embedding
participant ES as Elasticsearch
DEV->>COL: uv run semugpt-collect {source}
--max-items N
loop 페이지 단위
COL->>SRC: HTTP / Selenium fetch
SRC-->>COL: HTML / JSON / PDF
COL->>COL: parse + normalize
COL->>FS: append JSONL + .progress 저장
end
DEV->>IDX: uv run semugpt-index bulk
--index {source} --es-url ... --embed
IDX->>FS: read JSONL (resumable)
loop batch
IDX->>OAI: text-embedding-3-large
(content)
OAI-->>IDX: vector(3072)
IDX->>ES: _bulk index
tax-{source}
end
ES-->>IDX: indexed count
IDX-->>DEV: 진행률 + 통계
5. 인프라 토폴로지 (dev)¶
semu-gpt-dev.bootalk.co.kr (백엔드) + semu-chat-dev.bootalk.co.kr (프론트엔드)는 서로 다른 인프라에 산다.
graph TD
USER[개발자 / 클라이언트
웹 브라우저]
subgraph CF["Cloudflare"]
CFDNS[DNS: bootalk.co.kr zone]
CFW[Workers
semugpt-frontend-dev]
CFKV[KV
Next.js 정적 자산]
CFTUN[Tunnel: semugpt-backend
id 078d8083-...]
end
subgraph AWS["AWS Lightsail (계정 023888247019, ap-northeast-2a)"]
LS[Instance: semugpt-backend
medium_3_0 / 4GB / 80GB SSD
Static IP 3.39.17.132]
subgraph SystemD["systemd 서비스"]
CFD[cloudflared]
BE[semugpt-backend
Spring Boot :8080]
HM[health-monitor cron]
DA[disk-alarm cron]
end
subgraph Docker["Docker Compose"]
MY[(mysql:8.0
:3306)]
ES[(semugpt-es:8.17.0-nori
:9200)]
RD[(redis:7-alpine
:6379)]
end
SLACK[Slack Webhook
알림 전송]
end
USER -- semu-chat-dev --> CFW
CFW --> CFKV
CFW -- NEXT_PUBLIC_API_URL --> CFTUN
USER -- semu-gpt-dev --> CFTUN
CFDNS -.-> CFW
CFDNS -.-> CFTUN
CFTUN -- :8080 --> CFD
CFD --> BE
BE --> MY
BE --> ES
BE --> RD
HM --> SLACK
DA --> SLACK
6. 인프라 토폴로지 (prod, soft launch — Issue #151)¶
레거시 tax-gpt 와 신규 semugpt-2026 가 같은 ALB · RDS 를 공유하면서 병행 운영.
graph TD
LEGUSER[레거시 사용자
semugpt.co.kr / www / api / pro]
NEWUSER[신규 사용자
new.semugpt.co.kr
api-new.semugpt.co.kr]
R53[Route 53
semugpt.co.kr zone]
subgraph LegFront["레거시 프론트엔드 (변경 없음)"]
CFRONT[CloudFront × 2
EQH9... / EMX0...]
S3[(S3 buckets
semugpt.co.kr
semugpt-hosting)]
end
subgraph ALBLayer["ALB 계층 (재사용)"]
ALB[semu-gpt-alb
HTTPS:443 ACM *.semugpt.co.kr]
TGLEG[TG semu-gpt-instance
:80]
TGNEWBE[TG-backend-2026
:8080 priority 100]
TGNEWFE[TG-frontend-2026
:3000 priority 110]
end
subgraph LegBE["레거시 백엔드"]
EC2[EC2 i-07aea... t2.medium
tax-gpt Spring Boot :80]
end
subgraph NewBE["신규 Lightsail prod (계획)"]
LSP[semugpt-prod
large_3_0 / 8GB / $44/mo]
SBE[systemd: semugpt-backend :8080]
SFE[systemd: semugpt-frontend :3000]
DES[(Docker: ES + Redis)]
end
subgraph RDSLayer["RDS (공유)"]
RDS[(tax-gpt MySQL 8.0.44
db.t3.micro / 20GB)]
DBOLD[(database tax_gpt
레거시 live)]
DBNEW[(database semugpt_2026
account 324 + phone 351 + membership 642)]
end
LEGUSER --> R53
NEWUSER --> R53
R53 -- apex/www --> CFRONT
CFRONT --> S3
R53 -- api/pro --> ALB
R53 -- new/api-new --> ALB
ALB --> TGLEG
ALB --> TGNEWBE
ALB --> TGNEWFE
TGLEG --> EC2
TGNEWBE --> SBE
TGNEWFE --> SFE
SBE --> DES
SBE --> RDS
EC2 --> RDS
RDS --- DBOLD
RDS --- DBNEW
LSP --- SBE
LSP --- SFE
LSP --- DES
주의: 레거시 RDS 보안 그룹
sg-09b20a06...이0.0.0.0/0:3306으로 인터넷에 공개돼 있음. Hard cutover 후 잠글 예정 (CLAUDE.md"결정/미해결 사항" 참조).
7. 배포 파이프라인¶
graph LR
DEV[개발자]
GH[GitHub
uitiorg/semugpt-2026]
subgraph FrontPath["프론트엔드 배포 (자동)"]
GHA[GitHub Actions
deploy-frontend-dev.yml]
BUILD[pnpm build:cf
opennextjs-cloudflare]
DEPLOY[wrangler deploy]
WORKER[Cloudflare Worker
semugpt-frontend-dev]
end
subgraph BackPath["백엔드 배포 (수동)"]
SSH[ssh semugpt-aws]
PULL[git pull origin develop]
GRADLE[Gradle bootJar 자동]
SYSCTL[sudo systemctl restart
semugpt-backend]
end
subgraph DataPath["데이터 파이프라인 (로컬 CLI)"]
UVCOL[uv run semugpt-collect ...]
UVIDX[uv run semugpt-index bulk ...]
ESDIRECT[(Elasticsearch
dev / prod)]
end
DEV -- git push develop --> GH
GH -- push event
paths apps/frontend/** --> GHA
GHA --> BUILD
BUILD --> DEPLOY
DEPLOY --> WORKER
DEV --> SSH
SSH --> PULL
PULL --> GRADLE
GRADLE --> SYSCTL
DEV --> UVCOL
UVCOL --> UVIDX
UVIDX --> ESDIRECT
배포 트리거 / 환경변수 / 자격증명 상세는 운영 배포 및 개발 환경 페이지 참조.