<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

 <title>TIL</title>
 <link href="https://pyg410.github.io/atom.xml" rel="self"/>
 <link href="https://pyg410.github.io/"/>
 <updated>2026-05-14T20:24:35+09:00</updated>
 <id>https://pyg410.github.io</id>
 <author>
   <name>박영규</name>
   <email></email>
 </author>

 
 <entry>
   <title>FastAPI + LangGraph AI Agent self-hosted 모니터링 플랫폼 비교(Langfuse vs Arize Phoenix)</title>
   <link href="https://pyg410.github.io/ai-engineering/2026/05/13/phoenix-vs-langfuse/"/>
   <updated>2026-05-13T00:00:00+09:00</updated>
   <id>https://pyg410.github.io/ai-engineering/2026/05/13/phoenix-vs-langfuse</id>
   <content type="html">&lt;p&gt;폐쇄망에서 FastAPI &amp;amp; LangGraph 를 활용한 AI Agent를 개발할 때, self-hosted 배포가 가능한 모니터링 플랫폼을 비교합니다.&lt;/p&gt;

&lt;h2 id=&quot;1-langfuse-vs-arize-phoenix-선택-기준&quot;&gt;1. Langfuse vs Arize Phoenix 선택 기준&lt;/h2&gt;

&lt;h3 id=&quot;langgraph-연동-방식&quot;&gt;LangGraph 연동 방식&lt;/h3&gt;

&lt;p&gt;Langfuse는 콜백 핸들러 방식으로 LangGraph와 연동합니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;langfuse.callback&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CallbackHandler&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CallbackHandler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;callbacks&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;handler&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Phoenix는 OpenInference의 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangChainInstrumentor&lt;/code&gt;를 사용합니다. LangGraph는 별도의 instrumentation 패키지가 없으며, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;langchain-core&lt;/code&gt;를 공유 기반으로 사용하기 때문에 LangChain instrumentation으로 커버됩니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;openinference.instrumentation.langchain&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;LangChainInstrumentor&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;phoenix.otel&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tracer_provider&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;project_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-llm-app&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;auto_instrument&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;기능-비교&quot;&gt;기능 비교&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;항목&lt;/th&gt;
      &lt;th&gt;Langfuse&lt;/th&gt;
      &lt;th&gt;Arize Phoenix&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;LangGraph 공식 지원&lt;/td&gt;
      &lt;td&gt;콜백 핸들러&lt;/td&gt;
      &lt;td&gt;LangChain instrumentor 경유&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Self-host 구성 요소&lt;/td&gt;
      &lt;td&gt;6개 (web, worker, PG, ClickHouse, Redis, Minio)&lt;/td&gt;
      &lt;td&gt;1~2개 (앱 + DB)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;프롬프트 버전 관리&lt;/td&gt;
      &lt;td&gt;내장&lt;/td&gt;
      &lt;td&gt;없음&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;LLM 비용 자동 집계&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;O (SpanCostCalculator)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;RAG 평가&lt;/td&gt;
      &lt;td&gt;O&lt;/td&gt;
      &lt;td&gt;O (강점)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;수신 프로토콜&lt;/td&gt;
      &lt;td&gt;자체 HTTP API&lt;/td&gt;
      &lt;td&gt;OTLP 표준 (HTTP / gRPC)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;DB&lt;/td&gt;
      &lt;td&gt;ClickHouse + PostgreSQL&lt;/td&gt;
      &lt;td&gt;SQLite 또는 PostgreSQL&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;외부 큐&lt;/td&gt;
      &lt;td&gt;Redis (BullMQ)&lt;/td&gt;
      &lt;td&gt;In-memory Queue&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;2-기본-개념-정리&quot;&gt;2. 기본 개념 정리&lt;/h2&gt;

&lt;h3 id=&quot;trace와-span&quot;&gt;Trace와 Span&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Trace&lt;/strong&gt;: 유저 요청 하나의 전체 흐름&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Span&lt;/strong&gt;: 그 안의 개별 작업 단위 (LLM 호출, 툴 호출, DB 조회 등)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;각 Span은 시작/종료 시간, 소요 시간, 성공/실패 여부, 입출력 메타데이터를 담습니다.&lt;/p&gt;

&lt;p&gt;LangGraph 에이전트를 실행하면 노드 하나하나가 Span으로 기록됩니다.&lt;/p&gt;

&lt;h3 id=&quot;ingestion&quot;&gt;Ingestion&lt;/h3&gt;

&lt;p&gt;SDK에서 모니터링 서버로 trace/span 데이터를 전송하는 행위 자체를 ingestion이라고 합니다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;3-langfuse-self-hosted-아키텍처&quot;&gt;3. Langfuse Self-hosted 아키텍처&lt;/h2&gt;

&lt;h3 id=&quot;구성-요소&quot;&gt;구성 요소&lt;/h3&gt;

&lt;p&gt;Langfuse v3 self-hosted는 6개의 서비스로 구성됩니다.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;├── langfuse-web     (port 3000 — UI + API)
├── langfuse-worker  (port 3030 — 비동기 처리)
├── ClickHouse       (8123, 9000 — OLAP 분석 DB)
├── MinIO / S3       (9090 — 오브젝트 스토리지)
├── Redis            (6379 — 큐 + 캐시)
└── PostgreSQL       (5432 — 트랜잭션 DB)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;각-구성-요소의-역할&quot;&gt;각 구성 요소의 역할&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;구성 요소&lt;/th&gt;
      &lt;th&gt;역할&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;langfuse-web&lt;/strong&gt; (Next.js)&lt;/td&gt;
      &lt;td&gt;UI 콘솔 서빙, 모든 API 처리&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;langfuse-worker&lt;/strong&gt; (Express)&lt;/td&gt;
      &lt;td&gt;비동기 이벤트 처리, LLM-as-a-Judge, 배치 export&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;유저, 조직, 프로젝트, API 키, 프롬프트 등 트랜잭션 데이터&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;ClickHouse&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;trace, observation, score 등 대용량 로그 데이터&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;Redis&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;BullMQ 이벤트 큐 + API 키/프롬프트 캐시&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;strong&gt;S3 / MinIO&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;raw 이벤트 원본, 멀티모달 미디어, 배치 export 파일&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;데이터-흐름&quot;&gt;데이터 흐름&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SDK
 1. HTTP POST /api/public/ingestion
langfuse-web
 2. S3에 raw 이벤트 저장
 3. Redis에 S3 참조(경로)만 큐에 넣음
 4. 207 반환(HTTP Response)
langfuse-worker
 5. Redis에서 S3 참조 꺼냄
 6. S3에서 실제 데이터 읽어서 enriching (토큰/비용 계산 등)
 7. ClickHouse에 씀
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;핵심은 &lt;strong&gt;web이 DB에 직접 쓰지 않는다&lt;/strong&gt;는 것입니다. Redis 큐에는 S3 경로(참조)만 넘기고, 실제 데이터는 S3에 있습니다. 수신과 저장을 분리해서 트래픽이 몰려도 web이 죽지 않는 구조입니다.&lt;/p&gt;

&lt;h3 id=&quot;langfuse-web-vs-langfuse-worker-분리-이유&quot;&gt;langfuse-web vs langfuse-worker 분리 이유&lt;/h3&gt;

&lt;p&gt;Langfuse v3는 Event-Driven 백엔드 아키텍처를 채택했습니다. SDK에서 오는 HTTP 요청을 즉시 수신한 뒤 큐에 넣고 비동기로 처리합니다. 이를 통해 DB에 부하를 주지 않고 더 많은 요청을 처리할 수 있습니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;v2는 단일 컨테이너 + PostgreSQL만으로 구성되었으나, 수백만 row의 tracing 데이터에서 병목이 발생해 v3에서 ClickHouse, Redis, S3, worker 컨테이너가 추가됐습니다.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;s3--minio-사용-목적&quot;&gt;S3 / MinIO 사용 목적&lt;/h3&gt;

&lt;p&gt;Langfuse는 S3를 세 가지 용도로 사용합니다.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Raw 이벤트 저장 (필수)&lt;/strong&gt;: SDK에서 들어오는 이벤트 원본을 그대로 저장. 모든 배포에서 필수 설정입니다. ClickHouse가 일시적으로 불가해도 이벤트 유실을 방지하는 복구용 백업이기도 합니다.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;멀티모달 입출력 (선택)&lt;/strong&gt;: 이미지, 오디오 등 base64 인코딩된 미디어를 trace에 포함할 경우 SDK가 자동으로 S3에 업로드하고 trace에는 참조 문자열만 남깁니다.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;배치 export (선택)&lt;/strong&gt;: UI에서 CSV/JSON 대용량 내보내기 시 사용합니다.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;MinIO는 S3 API와 호환되는 오픈소스 오브젝트 스토리지 서버로, AWS 환경이라면 MinIO 없이 S3를 그대로 사용할 수 있습니다. Docker Compose / Helm 배포에서는 기본값으로 MinIO가 포함됩니다.&lt;/p&gt;

&lt;h3 id=&quot;s3-데이터-삭제-정책&quot;&gt;S3 데이터 삭제 정책&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;데이터 종류&lt;/th&gt;
      &lt;th&gt;삭제 시점&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Raw 이벤트 (text)&lt;/td&gt;
      &lt;td&gt;S3 lifecycle 정책으로 직접 설정 (기본값 없음, Langfuse Cloud는 30일)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;멀티모달 미디어&lt;/td&gt;
      &lt;td&gt;정책 설정 권장 안 함 (삭제 시 UI 참조 깨짐)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ClickHouse 데이터&lt;/td&gt;
      &lt;td&gt;Data Retention 기능으로 프로젝트별 설정 (최소 3일)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Data Retention 기능을 활성화하면 이벤트 만료 시 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;blob_storage_file_log&lt;/code&gt; 테이블을 참고해 S3의 해당 파일도 함께 삭제합니다.&lt;/p&gt;

&lt;h3 id=&quot;redis-역할-상세&quot;&gt;Redis 역할 상세&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;BullMQ 이벤트 큐&lt;/strong&gt;: web이 받은 이벤트의 S3 참조를 worker에게 전달&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;API 키 캐시&lt;/strong&gt;: 모든 API 호출마다 DB를 조회하지 않도록 인메모리 캐싱(보통 환경변수에 적는 API KEY입니다.)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;프롬프트 캐시&lt;/strong&gt;: 자주 사용되는 프롬프트를 read-through 캐시로 빠르게 제공&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;4-arize-phoenix-self-hosted-아키텍처&quot;&gt;4. Arize Phoenix Self-hosted 아키텍처&lt;/h2&gt;

&lt;h3 id=&quot;구성-요소-1&quot;&gt;구성 요소&lt;/h3&gt;

&lt;p&gt;Phoenix는 &lt;strong&gt;단일 컨테이너&lt;/strong&gt;로 구성됩니다.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;├── Phoenix 컨테이너 (port 6006 — UI + HTTP OTLP)
│                   (port 4317 — gRPC OTLP)
└── SQLite 또는 PostgreSQL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;데이터-흐름-1&quot;&gt;데이터 흐름&lt;/h3&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;SDK (openinference instrumentation)
 1. OTLP (HTTP: 6006/v1/traces 또는 gRPC: 4317)
Phoenix 단일 컨테이너
 2. In-memory Span Queue (최대 20,000개)
 3. BulkInserter (배치로 묶어서)
 4. SQLite 또는 PostgreSQL
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Span Queue가 20,000개 한도에 도달하면 새로운 span 요청에 HTTP 429를 반환합니다.&lt;/p&gt;

&lt;h3 id=&quot;스토리지&quot;&gt;스토리지&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;SQLite&lt;/strong&gt;: 기본값. 로컬 개발 및 소규모 배포에 적합.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;PostgreSQL&lt;/strong&gt;: 프로덕션 권장. 동시 접근, 백업/복제 등 표준 DB 도구 활용 가능.&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;5-구조-비교-요약&quot;&gt;5. 구조 비교 요약&lt;/h2&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;항목&lt;/th&gt;
      &lt;th&gt;Langfuse&lt;/th&gt;
      &lt;th&gt;Phoenix&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;컨테이너 수&lt;/td&gt;
      &lt;td&gt;6개&lt;/td&gt;
      &lt;td&gt;1개&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;큐 방식&lt;/td&gt;
      &lt;td&gt;Redis BullMQ (외부)&lt;/td&gt;
      &lt;td&gt;In-memory (내장, 20k 한도)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;분석 DB&lt;/td&gt;
      &lt;td&gt;ClickHouse&lt;/td&gt;
      &lt;td&gt;없음 (PostgreSQL만 사용)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;오브젝트 스토리지&lt;/td&gt;
      &lt;td&gt;S3 / MinIO (필수)&lt;/td&gt;
      &lt;td&gt;없음&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;수신 프로토콜&lt;/td&gt;
      &lt;td&gt;자체 HTTP API&lt;/td&gt;
      &lt;td&gt;OTLP 표준 (HTTP / gRPC)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;운영 복잡도&lt;/td&gt;
      &lt;td&gt;높음&lt;/td&gt;
      &lt;td&gt;낮음&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;대규모 트래픽 대응&lt;/td&gt;
      &lt;td&gt;유리 (이벤트 드리븐 + ClickHouse)&lt;/td&gt;
      &lt;td&gt;불리 (큐 20k 한도)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Self-host 시작 난이도&lt;/td&gt;
      &lt;td&gt;높음&lt;/td&gt;
      &lt;td&gt;낮음 (컨테이너 1개)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;6-결론&quot;&gt;6. 결론&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FastAPI + LangGraph 실서비스 기준:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;운영 안정성과 실시간 모니터링 &amp;amp; 대규모 트래픽이 중요하다면 → &lt;strong&gt;Langfuse&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;빠르게 띄우고 RAG 평가에 집중하고 싶다면 → &lt;strong&gt;Phoenix&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;이미 OpenTelemetry 인프라가 있다면 → &lt;strong&gt;Phoenix&lt;/strong&gt;가 자연스럽게 연동됨(OTLP)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;두 도구는 목적이 다르므로 병행 사용도 가능합니다. Phoenix로 로컬/스테이징 디버깅, Langfuse로 프로덕션 운영 모니터링을 나누는 방식도 실용적입니다.&lt;/p&gt;

&lt;p&gt;개인적으로는 Self-hosted로 둘 다 띄워봤을 때, UI/UX적으로 Langfuse가 통계 그래프가 많이 제공되어 더 좋았습니다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;참고&quot;&gt;참고&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/handbook/product-engineering/architecture&quot;&gt;Langfuse 공식 아키텍처 문서&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/self-hosting&quot;&gt;Langfuse Self-hosting&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/self-hosting/deployment/infrastructure/blobstorage&quot;&gt;Langfuse S3 / Blob Storage 설정&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/self-hosting/upgrade/upgrade-guides/upgrade-v2-to-v3&quot;&gt;Langfuse v2 to v3 업그레이드 가이드&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arize.com/docs/phoenix/self-hosting/architecture&quot;&gt;Phoenix Self-hosting 아키텍처&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://deepwiki.com/Arize-ai/phoenix/5.1-tracing-and-observability&quot;&gt;Phoenix Tracing &amp;amp; Observability&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-langchain&quot;&gt;OpenInference LangChain Instrumentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content>
 </entry>
 
 <entry>
   <title>Langfuse & Phoenix SDK 동작 원리 및 FastAPI 성능 영향 & 배포시 주의사항</title>
   <link href="https://pyg410.github.io/ai-engineering/2026/05/13/phoenix-and-langfuse-sdk/"/>
   <updated>2026-05-13T00:00:00+09:00</updated>
   <id>https://pyg410.github.io/ai-engineering/2026/05/13/phoenix-and-langfuse-sdk</id>
   <content type="html">&lt;p&gt;FastAPI + LangGraph 실서비스에 Langfuse 또는 Phoenix를 붙일 때 내부 동작을 이해하고 있으면 예상치 못한 문제를 피할 수 있습니다.
&lt;br /&gt;이 글은 두 SDK의 내부 구조, FastAPI에 미치는 성능 영향, 그리고 Gunicorn/Uvicorn 배포 환경에서의 주의사항을 정리합니다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;1-sdk는-fastapi에-부하를-주는가&quot;&gt;1. SDK는 FastAPI에 부하를 주는가?&lt;/h2&gt;

&lt;h3 id=&quot;langfuse-python-sdk&quot;&gt;Langfuse Python SDK&lt;/h3&gt;

&lt;p&gt;Langfuse SDK는 &lt;strong&gt;백그라운드 스레드 + 내부 큐&lt;/strong&gt; 방식으로 동작합니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“It uses a worker Thread and an internal queue to manage requests to the Langfuse backend asynchronously. Hence, the SDK adds only minimal latency to your application.”&lt;/p&gt;

  &lt;p&gt;— &lt;a href=&quot;https://langfuse.com/docs/sdk/python&quot;&gt;Langfuse Python SDK 공식 문서&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;콜백이 호출되면 span 데이터를 내부 큐에 넣고 즉시 리턴합니다. 실제 네트워크 전송은 별도의 백그라운드 스레드가 담당합니다.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;FastAPI 요청 처리 (메인)
    ↓ span 생성 → 내부 큐에 넣고 즉시 리턴
    ↓ 응답 반환

백그라운드 스레드 (별도)
    ↓ 큐에서 꺼내서 Langfuse 서버로 배치 전송
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;백그라운드 스레드의 전송 트리거 조건은 두 가지입니다.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;조건&lt;/th&gt;
      &lt;th&gt;파라미터&lt;/th&gt;
      &lt;th&gt;기본값&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;이벤트 개수&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_at&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;512개&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;시간 경과&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;flush_interval&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;5초&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;둘 중 먼저 충족되는 조건에 따라 배치 전송이 트리거됩니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://langfuse.com/docs/observability/features/queuing-batching&quot;&gt;Langfuse Event Queuing/Batching 문서&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;phoenix-opentelemetry-batchspanprocessor&quot;&gt;Phoenix (OpenTelemetry BatchSpanProcessor)&lt;/h3&gt;

&lt;p&gt;Phoenix는 OpenTelemetry의 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;를 사용합니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“The BatchSpanProcessor is the recommended implementation for production. It buffers spans in a queue and exports them in batches using a background worker thread.”&lt;/p&gt;

  &lt;p&gt;— &lt;a href=&quot;https://deepwiki.com/open-telemetry/opentelemetry-python/3.3-spanprocessor-and-pipeline&quot;&gt;OpenTelemetry Python SDK DeepWiki&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;트리거 조건은 환경변수로 제어합니다.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;조건&lt;/th&gt;
      &lt;th&gt;환경변수&lt;/th&gt;
      &lt;th&gt;기본값&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;span 개수&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OTEL_BSP_MAX_EXPORT_BATCH_SIZE&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;512개&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;시간 경과&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;OTEL_BSP_SCHEDULE_DELAY&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;5000ms (5초)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;주의&lt;/strong&gt;: Phoenix &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register()&lt;/code&gt; 함수의 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch&lt;/code&gt; 파라미터 기본값은 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;False&lt;/code&gt;입니다. 프로덕션에서는 반드시 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;batch=True&lt;/code&gt;를 명시해야 합니다. 그렇지 않으면 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SimpleSpanProcessor&lt;/code&gt;가 동작해서 span이 끝날 때마다 &lt;strong&gt;동기로 즉시 전송&lt;/strong&gt;되어 FastAPI 응답 시간에 직접 영향을 줍니다.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 프로덕션 권장 설정
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;phoenix.otel&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tracer_provider&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-app&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;batch&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 반드시 명시
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;auto_instrument&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://pypi.org/project/arize-phoenix-otel/&quot;&gt;arize-phoenix-otel PyPI&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;2-gil과-백그라운드-스레드&quot;&gt;2. GIL과 백그라운드 스레드&lt;/h2&gt;

&lt;p&gt;Python의 GIL(Global Interpreter Lock)로 인해 스레드가 여러 개여도 CPU 작업은 한 번에 하나만 실행됩니다. 하지만 백그라운드 스레드가 하는 작업 대부분은 네트워크 전송 대기(I/O)이며, GIL은 I/O 대기 중에는 자동으로 해제됩니다. 실제 CPU를 사용하는 구간(직렬화 등)만 GIL 경합이 발생하는데, 이 구간은 매우 짧습니다.&lt;/p&gt;

&lt;p&gt;이론적으로 나노초 수준의 오버헤드가 누적될 수 있지만, FastAPI를 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uvicorn&lt;/code&gt;으로 운영하는 async 환경에서는 GIL 경합 자체가 훨씬 줄어들어 체감 성능 저하로 이어지는 경우는 드뭅니다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;3-백그라운드-스레드-구현&quot;&gt;3. 백그라운드 스레드 구현&lt;/h2&gt;

&lt;h3 id=&quot;langfuse-sdk-v3v4&quot;&gt;Langfuse SDK (v3/v4)&lt;/h3&gt;

&lt;p&gt;Langfuse SDK v3/v4는 span 전송을 위해 OpenTelemetry의 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;를 상속한 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangfuseSpanProcessor&lt;/code&gt;를 사용합니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# langfuse/_client/span_processor.py
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;opentelemetry.sdk.trace.export&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BatchSpanProcessor&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;LangfuseSpanProcessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BatchSpanProcessor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;&quot;&quot;Langfuse 전용 필터링 및 인증을 추가한 BatchSpanProcessor 확장&quot;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;는 내부적으로 Python 표준 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;threading&lt;/code&gt;과 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;collections.deque&lt;/code&gt;를 사용합니다.&lt;/p&gt;

&lt;p&gt;백그라운드 스레드는 두 가지 역할로 분리되어 있습니다.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;span/trace 전송&lt;/strong&gt;: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangfuseSpanProcessor&lt;/code&gt; (OTel &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt; 상속)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;score ingestion, 미디어 업로드&lt;/strong&gt;: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangfuseResourceManager&lt;/code&gt;의 별도 consumer 스레드 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;threading&lt;/code&gt; 직접 사용)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://github.com/langfuse/langfuse-python/blob/main/langfuse/_client/span_processor.py&quot;&gt;langfuse-python GitHub — span_processor.py&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;langfuse-sdk-v2-레거시&quot;&gt;Langfuse SDK v2 (레거시)&lt;/h3&gt;

&lt;p&gt;v2는 구조가 완전히 다릅니다. OTel 기반이 아니라 자체 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskManager&lt;/code&gt;가 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;threading&lt;/code&gt;으로 consumer 스레드를 직접 생성합니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# langfuse/task_manager.py (v2)
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;init_resources&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_threads&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ingestion_consumer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IngestionConsumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(...)&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;ingestion_consumer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# threading.Thread.start() 직접 호출
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;이 구조 때문에 Gunicorn &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt; 환경에서 fork-unsafe 문제가 발생합니다. (자세한 내용은 5번 섹션 참고)&lt;/p&gt;

&lt;h3 id=&quot;opentelemetry-batchspanprocessor&quot;&gt;OpenTelemetry BatchSpanProcessor&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt; 자체는 얇은 래퍼이고, 실제 로직은 공통 클래스인 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchProcessor&lt;/code&gt;에 위임됩니다. deque, 워커 스레드, 트리거 조건 모두 이 공통 클래스 안에 구현되어 있습니다.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;opentelemetry-sdk/src/opentelemetry/sdk/
ㄴ trace/export/__init__.py         ← BatchSpanProcessor (래퍼)
ㄴ _shared_internal/__init__.py     ← BatchProcessor (실제 로직: deque, 워커 스레드)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_shared_internal/__init__.py&quot;&gt;OpenTelemetry Python SDK 소스코드 — &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_shared_internal/__init__.py&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;4-싱글톤-패턴과-스레드-초기화-시점&quot;&gt;4. 싱글톤 패턴과 스레드 초기화 시점&lt;/h2&gt;

&lt;h3 id=&quot;langfuse-sdk&quot;&gt;Langfuse SDK&lt;/h3&gt;

&lt;p&gt;Langfuse SDK는 &lt;strong&gt;싱글톤 패턴&lt;/strong&gt;을 사용합니다. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;public_key&lt;/code&gt;를 키로 하는 싱글톤으로 관리되며, 백그라운드 스레드는 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Langfuse()&lt;/code&gt; 클라이언트가 &lt;strong&gt;처음 초기화될 때 한 번&lt;/strong&gt; 생성됩니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# 이 시점에 백그라운드 스레드 생성
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;public_key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;pk-lf-...&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;secret_key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;sk-lf-...&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# 동일한 public_key로 재호출 시 기존 싱글톤 재사용, 스레드 새로 안 만듦
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;public_key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;pk-lf-...&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 기존 인스턴스 반환
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;명시적으로 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Langfuse()&lt;/code&gt;를 호출하지 않아도 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;get_client()&lt;/code&gt;를 처음 호출할 때 환경변수 기반으로 자동 초기화됩니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://langfuse.com/docs/sdk/python/low-level-sdk&quot;&gt;Langfuse Python SDK Overview&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;phoenix-opentelemetry-tracerprovider&quot;&gt;Phoenix (OpenTelemetry TracerProvider)&lt;/h3&gt;

&lt;p&gt;Phoenix는 Langfuse처럼 자체 싱글톤 패턴을 구현하지 않습니다. 대신 OpenTelemetry의 &lt;strong&gt;전역 TracerProvider&lt;/strong&gt; 메커니즘을 사용합니다.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register()&lt;/code&gt;를 호출하면 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TracerProvider&lt;/code&gt;가 생성되고, 기본적으로 OpenTelemetry 전역 provider로 등록됩니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;phoenix.otel&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;# TracerProvider 생성 + 전역 등록 + BatchSpanProcessor 워커 스레드 시작
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tracer_provider&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-app&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;batch&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;set_global_tracer_provider&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 기본값 True
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;전역 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TracerProvider&lt;/code&gt;는 &lt;strong&gt;한 번만 설정 가능&lt;/strong&gt;합니다. 이후 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;set_tracer_provider()&lt;/code&gt;를 다시 호출하면 경고 로그가 남고 무시됩니다. 따라서 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;register()&lt;/code&gt;는 앱 시작 시점에 &lt;strong&gt;한 번만&lt;/strong&gt; 호출해야 합니다.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;의 워커 스레드는 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TracerProvider&lt;/code&gt;가 생성될 때 함께 시작됩니다. 종료 시에는 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tracer_provider.shutdown()&lt;/code&gt;을 명시적으로 호출해야 워커 스레드가 정상 종료되고, 큐에 남은 span이 flush됩니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# FastAPI lifespan에서 Phoenix(OTel) 종료 처리
&lt;/span&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;contextlib&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;fastapi&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;opentelemetry&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trace&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# 종료 시 미전송 span flush 후 워커 스레드 종료
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;trace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_tracer_provider&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shutdown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;참고: &lt;a href=&quot;https://arize-phoenix.readthedocs.io/projects/otel/en/latest/api/register.html&quot;&gt;Phoenix OTEL register() 공식 문서&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;5-gunicorn--uvicorn-멀티-프로세스-환경에서의-주의사항&quot;&gt;5. Gunicorn / Uvicorn 멀티 프로세스 환경에서의 주의사항&lt;/h2&gt;

&lt;h3 id=&quot;gunicorn-pre-fork-모델&quot;&gt;Gunicorn pre-fork 모델&lt;/h3&gt;

&lt;p&gt;Gunicorn은 &lt;strong&gt;pre-fork&lt;/strong&gt; 모델을 사용합니다. HTTP 요청을 받기 전에 마스터 프로세스가 워커를 미리 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;os.fork()&lt;/code&gt;로 생성합니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“The pre in pre-forked means that the master process creates the workers before handling any HTTP request.”&lt;/p&gt;

  &lt;p&gt;— &lt;a href=&quot;https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7&quot;&gt;Gunicorn 아키텍처 설명&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;os.fork()&lt;/code&gt;는 부모 프로세스의 &lt;strong&gt;메모리는 복사하지만 스레드는 복사하지 않습니다.&lt;/strong&gt; POSIX 표준에 따른 동작입니다.&lt;/p&gt;

&lt;h3 id=&quot;gunicorn---preload-옵션&quot;&gt;Gunicorn &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt; 옵션&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt;는 &lt;strong&gt;메모리 절약&lt;/strong&gt;을 위한 옵션입니다.&lt;/p&gt;

&lt;p&gt;기본적으로 Gunicorn은 fork 후 각 워커가 앱 코드를 개별적으로 로드합니다. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt;를 사용하면 마스터 프로세스에서 앱을 먼저 로드한 뒤 fork하여, OS의 copy-on-write 덕분에 워커들이 메모리 페이지를 공유할 수 있습니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“By preloading an application you can save some RAM resources as well as speed up server boot times.”&lt;/p&gt;

  &lt;p&gt;— &lt;a href=&quot;https://docs.gunicorn.org/en/stable/settings.html&quot;&gt;Gunicorn 공식 문서&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt;기본 (no preload)&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;앱 로드 시점&lt;/td&gt;
      &lt;td&gt;fork 후 각 워커가 개별 로드&lt;/td&gt;
      &lt;td&gt;fork 전 마스터에서 한 번 로드&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;메모리&lt;/td&gt;
      &lt;td&gt;워커 수 × 앱 크기&lt;/td&gt;
      &lt;td&gt;copy-on-write로 공유 → 절약&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;부팅 속도&lt;/td&gt;
      &lt;td&gt;느림&lt;/td&gt;
      &lt;td&gt;빠름&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--reload&lt;/code&gt;와 호환&lt;/td&gt;
      &lt;td&gt;가능&lt;/td&gt;
      &lt;td&gt;불가&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;sdk-버전별-fork-안전성&quot;&gt;SDK 버전별 fork 안전성&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;SDK&lt;/th&gt;
      &lt;th&gt;버전&lt;/th&gt;
      &lt;th&gt;span 전송 방식&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;os.register_at_fork&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt; 안전성&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;Langfuse&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;v2&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskManager&lt;/code&gt; 자체 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;threading&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;없음&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;위험&lt;/strong&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Langfuse&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;v3/v4&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt; 상속 (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangfuseSpanProcessor&lt;/code&gt;)&lt;/td&gt;
      &lt;td&gt;상속받음&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Phoenix (OTel)&lt;/td&gt;
      &lt;td&gt;-&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;있음&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 id=&quot;langfuse-v2의-문제&quot;&gt;Langfuse v2의 문제&lt;/h4&gt;

&lt;p&gt;v2는 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TaskManager.__init__&lt;/code&gt;에서 바로 consumer 스레드를 시작합니다. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt; 환경에서 전역으로 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Langfuse()&lt;/code&gt;를 초기화하면 마스터 프로세스에서 스레드가 생성된 뒤 fork가 일어나고, 자식 프로세스에는 스레드가 없어서 이벤트가 전송되지 않습니다. 최악의 경우 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;RuntimeError: can&apos;t start new thread&lt;/code&gt;가 발생합니다.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;실제 사례: &lt;a href=&quot;https://github.com/langfuse/langfuse/issues/3405&quot;&gt;Langfuse GitHub Issue #3405&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# X v2, --preload 환경에서 위험
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 마스터에서 초기화 → fork 후 워커에 스레드 없음
&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# O v2, --preload 환경에서 안전: post_fork 훅 활용
# gunicorn.conf.py
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;post_fork&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;worker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;         &lt;span class=&quot;c1&quot;&gt;# 마스터에서 생성된 인스턴스 초기화
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# fork 이후 워커에서 새로 초기화
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;v2는 현재 유지보수가 종료된 레거시 버전입니다. 근본적인 해결은 v3/v4로 업그레이드입니다.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4 id=&quot;langfuse-v3v4와-phoenix는-안전한-이유&quot;&gt;Langfuse v3/v4와 Phoenix는 안전한 이유&lt;/h4&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;는 내부적으로 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;os.register_at_fork&lt;/code&gt;를 사용해 fork 후 자식 프로세스에서 워커 스레드를 자동으로 재초기화합니다.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# opentelemetry-sdk 내부 (_shared_internal/__init__.py)
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;hasattr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;register_at_fork&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;weak_reinit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;weakref&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WeakMethod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_at_fork_reinit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;os&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_at_fork&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;after_in_child&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;weak_reinit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Langfuse v3/v4의 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LangfuseSpanProcessor&lt;/code&gt;는 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BatchSpanProcessor&lt;/code&gt;를 상속하므로 이 보호가 자동으로 적용됩니다.&lt;/p&gt;

&lt;h3 id=&quot;uvicorn---workers&quot;&gt;Uvicorn &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--workers&lt;/code&gt;&lt;/h3&gt;

&lt;p&gt;Uvicorn은 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--workers&lt;/code&gt; 옵션으로 멀티 프로세스를 띄울 때 Gunicorn과 달리 &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;os.fork()&lt;/code&gt;가 아닌 Python &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;multiprocessing&lt;/code&gt; 라이브러리의 &lt;strong&gt;spawn&lt;/strong&gt; 방식을 사용합니다. &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;spawn&lt;/code&gt;은 완전히 새로운 Python 인터프리터를 시작하므로 앱 코드를 처음부터 다시 실행합니다. 따라서 모든 버전의 Langfuse SDK와 Phoenix가 각 워커에서 새로 초기화되어 fork 문제가 발생하지 않습니다.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;uvicorn main:app --workers 4
    ↓ spawn → 새 Python 인터프리터 × 4
    ↓ 각 워커에서 앱 코드 처음부터 실행
    ↓ SDK 각자 초기화 → 정상
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;배포-환경별-정리&quot;&gt;배포 환경별 정리&lt;/h3&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;환경&lt;/th&gt;
      &lt;th&gt;멀티 프로세스 방식&lt;/th&gt;
      &lt;th&gt;Langfuse v2&lt;/th&gt;
      &lt;th&gt;Langfuse v3/v4&lt;/th&gt;
      &lt;th&gt;Phoenix&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;uvicorn --workers N&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;spawn&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gunicorn&lt;/code&gt; (기본)&lt;/td&gt;
      &lt;td&gt;os.fork (앱은 워커에서 로드)&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gunicorn --preload&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;os.fork (앱을 마스터에서 로드)&lt;/td&gt;
      &lt;td&gt;&lt;strong&gt;위험&lt;/strong&gt;&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
      &lt;td&gt;안전&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;6-fastapi--langfuse--phoenix-권장-설정&quot;&gt;6. FastAPI + Langfuse / Phoenix 권장 설정&lt;/h2&gt;

&lt;h3 id=&quot;langfuse-v3v4&quot;&gt;Langfuse (v3/v4)&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;contextlib&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;fastapi&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;get_client&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# 종료 시 미전송 이벤트 모두 flush 후 스레드 종료
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;get_client&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shutdown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;shutdown()&lt;/code&gt; 없이 프로세스가 종료되면 큐에 남은 이벤트가 유실될 수 있습니다.&lt;/p&gt;

  &lt;p&gt;참고: &lt;a href=&quot;https://langfuse.com/docs/sdk/python&quot;&gt;Langfuse Python SDK 공식 문서&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;langfuse-v2--gunicorn---preload&quot;&gt;Langfuse v2 + Gunicorn &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--preload&lt;/code&gt;&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c1&quot;&gt;# gunicorn.conf.py
&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;preload_app&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;post_fork&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;worker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;worker_exit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;server&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;worker&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;is&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;not&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;None&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;application&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;langfuse&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flush&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;phoenix&quot;&gt;Phoenix&lt;/h3&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;contextlib&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;fastapi&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;phoenix.otel&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;opentelemetry&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;trace&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tracer_provider&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;register&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;project_name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;my-app&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;batch&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;# 프로덕션에서 반드시 명시
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asynccontextmanager&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;# 종료 시 미전송 span flush 후 워커 스레드 종료
&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;trace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_tracer_provider&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shutdown&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;app&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FastAPI&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lifespan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tracer_provider.shutdown()&lt;/code&gt;을 호출하지 않으면 BatchSpanProcessor 워커 스레드가 살아 있는 상태로 프로세스가 종료되어 큐에 남은 span이 유실될 수 있습니다.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;참고-문서&quot;&gt;참고 문서&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/docs/sdk/python&quot;&gt;Langfuse Python SDK&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://langfuse.com/docs/observability/features/queuing-batching&quot;&gt;Langfuse Event Queuing/Batching&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/langfuse/langfuse-python/blob/main/langfuse/_client/span_processor.py&quot;&gt;Langfuse SDK — span_processor.py&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/langfuse/langfuse/issues/3405&quot;&gt;Langfuse GitHub Issue #3405 (v2 fork 문제)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://pypi.org/project/arize-phoenix-otel/&quot;&gt;arize-phoenix-otel PyPI&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arize-phoenix.readthedocs.io/projects/otel/en/latest/api/register.html&quot;&gt;Phoenix OTEL register() 공식 문서&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://arize.com/docs/phoenix/self-hosting/architecture&quot;&gt;Phoenix Self-hosting Architecture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-sdk/src/opentelemetry/sdk/_shared_internal/__init__.py&quot;&gt;OpenTelemetry Python SDK &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;_shared_internal&lt;/code&gt; 소스&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://gunicorn.org/design/&quot;&gt;Gunicorn Design&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.joelsleppy.com/blog/gunicorn-application-preloading/&quot;&gt;Gunicorn Application Preloading&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/langfuse/langfuse-python&quot;&gt;langfuse-python GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</content>
 </entry>
 
 <entry>
   <title>폐쇄망 Docker Network Subnet 충돌로 인해 특정 서버 SSH/Ping Timeout 발생한 장애 분석</title>
   <link href="https://pyg410.github.io/infra/2026/05/12/docker-prob/"/>
   <updated>2026-05-12T00:00:00+09:00</updated>
   <id>https://pyg410.github.io/infra/2026/05/12/docker-prob</id>
   <content type="html">&lt;p&gt;개발서버에서 Docker compose를 이용해 container를 띄운 후 부터, 갑자기 Local → 개발서버로의 모든 SSH/PING/HTTP 요청에 대해 Timeout이 발생했다.
&lt;br /&gt;
처음에는 방화벽 문제라고 생각했지만, 실제 원인은 Docker Network Subnet 충돌로 인해 호스트 라우팅 테이블이 꼬인 문제였다.&lt;/p&gt;

&lt;p&gt;이번 글에서는 다음 순서로 정리한다.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;문제 상황&lt;/li&gt;
  &lt;li&gt;문제 원인&lt;/li&gt;
  &lt;li&gt;원인 분석 과정&lt;/li&gt;
  &lt;li&gt;해결 방법&lt;/li&gt;
  &lt;li&gt;재발 방지 방법&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;문제-상황&quot;&gt;문제 상황&lt;/h1&gt;

&lt;p&gt;운영 중인 서버에서 갑자기 다음과 같은 현상이 발생했다.&lt;/p&gt;

&lt;h2 id=&quot;증상&quot;&gt;증상&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Docker compose 컨테이너를 띄운 서버만 SSH timeout 발생&lt;/li&gt;
  &lt;li&gt;ping timeout 발생&lt;/li&gt;
  &lt;li&gt;다른 서버들은 정상 접속 가능&lt;/li&gt;
  &lt;li&gt;다른 서버 → 문제 서버 SSH, HTTP 가능&lt;/li&gt;
  &lt;li&gt;로컬 PC → 문제 서버만 접속 불가&lt;/li&gt;
  &lt;li&gt;Docker 컨테이너 포트는 LISTEN 상태&lt;/li&gt;
  &lt;li&gt;sshd 프로세스 정상 동작&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;예시:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ssh user@server &lt;span class=&quot;nt&quot;&gt;-p&lt;/span&gt; 7002
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;결과:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Connection timed out
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;ping도 동일하게 timeout 발생:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ping server-ip
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;처음-의심했던-원인들&quot;&gt;처음 의심했던 원인들&lt;/h1&gt;

&lt;p&gt;처음에는 아래 항목들을 의심했다.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;운영 방화벽 whitelist 누락&lt;/li&gt;
  &lt;li&gt;fail2ban 차단&lt;/li&gt;
  &lt;li&gt;sshd 장애&lt;/li&gt;
  &lt;li&gt;Docker iptables 충돌&lt;/li&gt;
  &lt;li&gt;Security Group 문제&lt;/li&gt;
  &lt;li&gt;서버 네트워크 장애&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;하지만 이상한 점이 있었다.&lt;/p&gt;

&lt;h2 id=&quot;이상했던-점&quot;&gt;이상했던 점&lt;/h2&gt;

&lt;p&gt;다른 서버에서는 문제 서버로 SSH 및 HTTP 요청이 정상 동작했다.&lt;/p&gt;

&lt;p&gt;즉&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;다른 서버 -&amp;gt; 문제 서버 : 정상
내 PC -&amp;gt; 문제 서버 : timeout
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;이 상태였다.&lt;/p&gt;

&lt;p&gt;즉 서버 자체는 살아있고,
특정 source network에서만 문제가 발생하는 상황이었다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;원인-분석&quot;&gt;원인 분석&lt;/h1&gt;

&lt;p&gt;문제 서버에서 Docker Network 목록을 확인했다.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker network &lt;span class=&quot;nb&quot;&gt;ls&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;이후 특정 네트워크의 subnet을 확인했다.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker network inspect 네트워크명
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;그리고 문제를 발견했다.&lt;/p&gt;

&lt;h2 id=&quot;문제의-docker-subnet&quot;&gt;문제의 Docker Subnet&lt;/h2&gt;

&lt;p&gt;Docker Network가 다음과 같은 subnet을 사용 중이었다.&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.0.0/16
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;그런데 내 로컬 PC IP도 동일한 대역이었다.&lt;/p&gt;

&lt;p&gt;예:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.x.x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;실제로-발생한-문제&quot;&gt;실제로 발생한 문제&lt;/h1&gt;

&lt;p&gt;Docker는 Network를 생성할 때 호스트 OS의 라우팅 테이블에 route를 추가한다.&lt;/p&gt;

&lt;p&gt;호스트 라우팅 테이블 확인&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ip route
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;문제 상황에서는 이런 route가 추가되어 있었다.&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.0.0/16 dev br-xxxxx
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;의미:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.* 대역은 Docker Bridge Network로 보내라
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;즉 서버는 내 PC IP를 외부 네트워크가 아니라 Docker 내부 네트워크 주소로 오인하게 되었다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;왜-timeout이-발생했는가&quot;&gt;왜 timeout이 발생했는가?&lt;/h1&gt;

&lt;p&gt;패킷 흐름은 다음과 같았다.&lt;/p&gt;
&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;내 PC -&amp;gt; 서버
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;패킷은 정상적으로 서버까지 도착한다.&lt;/p&gt;
&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;서버 -&amp;gt; 내 PC 응답
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;문제는 여기서 발생한다.&lt;/p&gt;

&lt;p&gt;서버는&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.0.x 는 Docker Network 대역
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;이라고 판단하여 응답 패킷을 실제 NIC(eth0)가 아니라 Docker Bridge(br-xxxx)로 보내버렸다.&lt;/p&gt;

&lt;p&gt;즉 응답 패킷이 잘못된 인터페이스로 라우팅되면서&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;ping timeout&lt;/li&gt;
  &lt;li&gt;ssh timeout&lt;/li&gt;
  &lt;li&gt;http timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;이 발생한 것이다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;확인-방법&quot;&gt;확인 방법&lt;/h1&gt;

&lt;h3 id=&quot;현재-라우팅-테이블-확인&quot;&gt;현재 라우팅 테이블 확인&lt;/h3&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ip route
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;특정-ip가-어디로-라우팅되는지-확인&quot;&gt;특정 IP가 어디로 라우팅되는지 확인&lt;/h3&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ip route get 내PC_IP
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;문제 상황에서는&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;aaa.bbb.0.106 dev br-xxxxx
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;처럼 Docker Bridge로 라우팅되고 있었다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;해결-방법&quot;&gt;해결 방법&lt;/h1&gt;

&lt;p&gt;문제가 되는 Docker Network를 삭제했다.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;docker network &lt;span class=&quot;nb&quot;&gt;rm &lt;/span&gt;네트워크명
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;삭제 직후 모든 문제가 즉시 해결되었다.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;ping 정상화&lt;/li&gt;
  &lt;li&gt;SSH 정상화&lt;/li&gt;
  &lt;li&gt;HTTP 정상화&lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;재발-방지-방법&quot;&gt;재발 방지 방법&lt;/h1&gt;

&lt;p&gt;가장 중요한 건 Docker Network 대역을 명시적으로 관리하는 것이다.&lt;/p&gt;

&lt;p&gt;특히 폐쇄망/사내망 환경에서는 반드시 필요하다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;방법-1-docker-default-address-pool-고정&quot;&gt;방법 1. Docker Default Address Pool 고정&lt;/h1&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/etc/docker/daemon.json&lt;/code&gt;&lt;/p&gt;

&lt;div class=&quot;language-json highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;default-address-pools&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;base&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;172.30.0.0/16&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
      &lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;&quot;size&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;w&quot;&gt; &lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
    &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
  &lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;w&quot;&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;적용 후 아래 명령어 실행&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;systemctl restart docker
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;방법-2-docker-compose에서-subnet-명시&quot;&gt;방법 2. Docker Compose에서 Subnet 명시&lt;/h1&gt;

&lt;div class=&quot;language-yaml highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;na&quot;&gt;networks&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
  &lt;span class=&quot;na&quot;&gt;backend&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;na&quot;&gt;ipam&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
      &lt;span class=&quot;na&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;pi&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;subnet&lt;/span&gt;&lt;span class=&quot;pi&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;aaa.bbb.10.0/24&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;운영-환경에서-주의할-점&quot;&gt;운영 환경에서 주의할 점&lt;/h1&gt;

&lt;p&gt;사내망/VPN 환경에서는 아래 대역이 이미 사용 중인 경우가 많다.&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;10.x.x.x
192.168.x.x
172.16.x.x ~ 172.31.x.x
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Docker 자동 subnet 할당을 그대로 사용하면 실제 네트워크와 충돌할 수 있다.&lt;/p&gt;

&lt;hr /&gt;

&lt;h1 id=&quot;결론&quot;&gt;결론&lt;/h1&gt;

&lt;p&gt;이번 장애의 핵심 원인은&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Docker Network Subnet이 실제 네트워크 대역과 충돌하면서
호스트 라우팅 테이블이 Docker Bridge를 우선하게 된 것&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;이었다.&lt;/p&gt;

&lt;p&gt;Docker Network는 단순히 컨테이너 내부만 사용하는 것이 아니라,
호스트 OS의 라우팅에도 직접 영향을 준다.&lt;/p&gt;

&lt;p&gt;특히 폐쇄망/사내망 환경에서는 반드시 Docker Subnet 정책을 명시적으로 관리해야 한다.&lt;/p&gt;
</content>
 </entry>
 

</feed>
