<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Llm on Technical Blog</title>
    <link>https://hugo-blog-923.pages.dev/tags/llm/</link>
    <description>Recent content in Llm on Technical Blog</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Tue, 07 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://hugo-blog-923.pages.dev/tags/llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>LLM-as-judge for RAG: what to score, what to distrust</title>
      <link>https://hugo-blog-923.pages.dev/posts/rag-llm-judge-quality-loop/</link>
      <pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://hugo-blog-923.pages.dev/posts/rag-llm-judge-quality-loop/</guid>
      <description>&lt;p align=&#34;center&#34;&gt;
  &lt;img src=&#34;pexels_12969403.jpg&#34; alt=&#34;Laptop showing an analytics-style dashboard for automated answer-quality signals.&#34; style=&#34;max-width: min(100%, 820px); height: auto;&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLM-as-judge&lt;/strong&gt; adds scale when human reviewers cannot read every RAG interaction. A common pattern: after the answer path returns, enqueue &lt;strong&gt;question, answer, and retrieved context&lt;/strong&gt; to a queue; a &lt;strong&gt;worker Lambda&lt;/strong&gt; runs a judge prompt; results land in a &lt;strong&gt;database&lt;/strong&gt; for analytics.&lt;/p&gt;
&lt;h2 id=&#34;what-judges-are-good-for&#34;&gt;What judges are good for&lt;/h2&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Use&lt;/th&gt;
          &lt;th&gt;Reason&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Trend monitoring&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Average scores or failure flags shifting after a deploy&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Sampling for humans&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Pull low-scoring rows for manual review&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;&lt;strong&gt;Regression alarms&lt;/strong&gt;&lt;/td&gt;
          &lt;td&gt;Chunk size, top-k, or model changes moving the distribution&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Judges are &lt;strong&gt;cheap sensors&lt;/strong&gt;, not auditors.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Internal-docs RAG: chat ingress, vector search, and an async judge loop</title>
      <link>https://hugo-blog-923.pages.dev/posts/internal-docs-rag-architecture-notes/</link>
      <pubDate>Sat, 04 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://hugo-blog-923.pages.dev/posts/internal-docs-rag-architecture-notes/</guid>
      <description>&lt;p align=&#34;center&#34;&gt;
  &lt;img src=&#34;pexels_10376254.jpg&#34; alt=&#34;Overhead view of a professional at a desk with laptop, tablet, and papers — internal work and digital tools.&#34; style=&#34;max-width: min(100%, 820px); height: auto;&#34; /&gt;
&lt;/p&gt;
&lt;p&gt;This note describes an &lt;strong&gt;internal RAG&lt;/strong&gt; pattern for &lt;strong&gt;policy and handbook-style documents&lt;/strong&gt;: employees ask questions in a familiar chat surface, the backend retrieves by &lt;strong&gt;semantic similarity&lt;/strong&gt;, and a &lt;strong&gt;separate evaluation path&lt;/strong&gt; scores answers for quality and traceability. The layout maps cleanly to typical &lt;strong&gt;AWS&lt;/strong&gt; building blocks (API Gateway, Lambdas, object storage, a vector index, DynamoDB, and a queue).&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
