<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Michael Bassili</title>
        <description>I&apos;m a Cloud Developer Who Loves to Build Cool Stuff With AWS, Azure, Python, and AI.</description>
        <link>https://bassi.li/</link>
        <atom:link href="https://bassi.li/feed.xml" rel="self" type="application/rss+xml"/>
        <pubDate>Fri, 20 Feb 2026 02:47:11 +0000</pubDate>
        <lastBuildDate>Fri, 20 Feb 2026 02:47:11 +0000</lastBuildDate>
        <generator>Jekyll v3.10.0</generator>
        
            <item>
                <title>I Miss Using Em Dashes</title>
                <description>&lt;p&gt;I really miss using em dashes in my writing. Ever since content creators started using ChatGPT to help (or supplement) their writing, em dashes have become indicators of AI use. Students are routinely caught with their pants down as professors flag an essay as AI-generated based on the presence of lists, positive-leaning prose, and em dashes.&lt;/p&gt;

&lt;p&gt;Em dashes can be found everywhere across my personal and professional writing. Nowadays, I find myself avoiding em dashes because I’m afraid that my writing will be flagged as AI-generated and dismissed as slop. I feel like I have to “dumb down” aspects of writing to convince readers that the words they are skimming were, in fact, written by a human. In turn, this results in this sort of meta-game where I choose my words carefully—typically ensuring that I include the &lt;em&gt;right&lt;/em&gt; amount of grammatical character and/or mistakes—to convince readers that they aren’t wasting their time reading slop on the internet. Writing these two em dashes &lt;em&gt;felt&lt;/em&gt; suspicious because I’m trying to insert them into my writing where readers will least expect ChatGPT to add them.&lt;/p&gt;

&lt;p&gt;I’m curious (and more than a bit worried) that the writing that is being produced these days is being shaped by LLMs, even if an LLM has never touched a particular piece of prose. We are all collectively aware of what slop “feels like” to read, and that means that serious writers are conscious of how their word choice, punctuation, and flow are perceived by readers. The resulting piece of writing has therefore been shaped by the mere presence of consumer-grade LLMs.&lt;/p&gt;

&lt;p&gt;The worst part is that models like ChatGPT can change between models; a new foundation model might drop that over-uses something else, like semi-colons, leading to future articles/books/papers/reports/etc avoiding its use to avoid arousing suspicion. As a software engineer, I love LLMs, but I’m unhappy with the amount of &lt;em&gt;soft power&lt;/em&gt; they have on the creatives of the world, especially online. If an em dash fits into one’s writing but they avoid using it out of fear, our AI overlords have won.&lt;/p&gt;
</description>
                <pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate>
                <link>https://bassi.li/articles/i-miss-using-em-dashes</link>
                <guid isPermaLink="true">https://bassi.li/articles/i-miss-using-em-dashes</guid>
                
                <category>AI</category>
                
                <category>Writing</category>
                
                <category>Art</category>
                
                
            </item>
        
            <item>
                <title>The Singular Devotion of Oppenheimer and the Cost of Absolute Loyalty</title>
                <description>&lt;p&gt;One of my favourite public figures is Robert Oppenheimer, a tragic, passionate scientist who was martyred for politics. JRO reminds me of Naked Snake, another character who suffered for his nation at the expense of both his personal creed and his relationships. I read American Prometheus around the time that the movie Oppenheimer hit theatres, taking copious notes and replaying Metal Gear Solid 3 at the same time. Snake and Oppenheimer both altered their respective worlds, sacrificing themselves in the process. It’s strange to realize how tightly connected the world’s greatest scientists were during the mid-20th century. Names like Niels Bohr, Enrico Fermi, and Albert Einstein often seem separated by time, tucked away into different chapters of physics textbooks. But in reality, they all crossed paths, often literally sitting at the same tables, drinking together, arguing, and shaping the modern world in real time. J. Robert Oppenheimer, the theoretical physicist most associated with the Manhattan Project, wasn’t simply a solitary genius in the desert. He was right in the center of it all, surrounded by peers whose names would later become synonymous with science itself. Oppenheimer’s life wasn’t just a story of intellectual achievement. It was a case study in how absolute dedication to a single pursuit can both elevate and destroy a person. He lived in full commitment to his craft and to his country. The price of that commitment was high.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“Man is a creature whose substance is faith.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There is a quote he often referenced from the Bhagavad-Gita, which he first cited upon the death of Franklin D. Roosevelt. He said, “Man is a creature whose substance is faith. What his faith is, he is.” It became clear that Oppenheimer’s faith had always been tied to science and, later, to his nation. That faith defined him completely. One of the more unsettling revelations in his story is the way the United States deployed the atomic bomb. The common belief is that the bombings of Hiroshima and Nagasaki were necessary to end the war. Oppenheimer himself was surprised to learn that Japan was already close to surrender. The Soviet Union was preparing to enter the Pacific theater, which would have likely ended the war without the need for such devastation. Nevertheless, American leadership chose to proceed with the attacks, targeting cities with large civilian populations to make a global statement.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;His security clearance was revoked after a series of hearings designed not to seek truth but to publicly humiliate him.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Oppenheimer became haunted by this decision. Though he had dedicated years of his life to building the bomb, he began to fear what he had unleashed. He warned against further escalation, particularly the development of even more destructive hydrogen bombs. His warnings were not welcomed. Instead, the government began to treat him as a political liability. His security clearance was revoked after a series of hearings designed not to seek truth but to publicly humiliate him. Key evidence was hidden from his defense team. The outcome had been decided before the hearings even began. His story mirrors the arc of Naked Snake in Metal Gear Solid 3, a character punished by the very nation he served. Snake, much like Oppenheimer, carried out his mission with complete loyalty, only to find himself discarded once his usefulness ended. Both men became weapons for their governments. Both were ultimately left behind once their missions became inconvenient. In both cases, the betrayal wasn’t accidental. It was calculated. Oppenheimer’s mistakes were not just technical but personal. He had previously cooperated with federal investigators, providing information about friends and colleagues with communist ties. He seemed to believe that his scientific stature would insulate him from consequences. When those same conversations were later used against him, he denied them, but it was too late. The damage was irreversible. He had been naive about how Washington worked. His own words became the rope with which his enemies tied the noose.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“We may be likened to two scorpions in a bottle, each capable of killing the other.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lewis Strauss, a powerful figure in the Atomic Energy Commission, had long sought to discredit Oppenheimer. The political climate of the Red Scare gave Strauss the perfect opportunity to erase Oppenheimer from public life. His downfall wasn’t just political. It was total. He lost his position, his influence, and his role within the scientific community. His health deteriorated. His wife, Kitty, and his daughter, Toni, both died not long after. The man who had once stood at the center of the scientific world died feeling abandoned and erased. Oppenheimer once described the nuclear standoff between the United States and the Soviet Union by saying, “We may be likened to two scorpions in a bottle, each capable of killing the other.” He also warned, “You can’t have this kind of war. There just aren’t enough bulldozers to scrape the bodies off the street.” His vision for the future was clear and terrifying. Yet after his fall from grace, he was left without any meaningful role in shaping the world he had helped create. The story feels painfully familiar today. As Russia wages war in Ukraine, many analysts have noted parallels between Vladimir Putin and earlier Soviet leaders. There is a passage in &lt;em&gt;American Prometheus&lt;/em&gt; that captures this mindset, describing how Stalin sought to protect his internal empire but was not necessarily seeking external war, knowing that such a conflict could destabilize his regime. The same pattern appears to be playing out again, with internal strife reportedly growing in Russia as the war drags on. Leaders who lash out in fear of losing control often accelerate their own downfall.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;[H]e believed his loyalty and achievements would shield him.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Oppenheimer’s life raises an uncomfortable question. What happens to those who devote themselves entirely to serving a cause, only to discover that their sacrifice means nothing in the end? His tragedy wasn’t simply that he created something dangerous. It was that he believed his loyalty and achievements would shield him. In reality, the system he served was eager to discard him the moment he became a liability. His life was a warning, both about the dangers of unchecked technological power and about the brutal cost of idealism in a world that rewards control above all else. Just like Naked Snake, Oppenheimer learned that serving a nation doesn’t guarantee honor. Sometimes, it guarantees exile.&lt;/p&gt;
</description>
                <pubDate>Sun, 06 Jul 2025 00:00:00 +0000</pubDate>
                <link>https://bassi.li/articles/oppenheimer-cost-of-loyalty</link>
                <guid isPermaLink="true">https://bassi.li/articles/oppenheimer-cost-of-loyalty</guid>
                
                <category>Culture</category>
                
                <category>Literature</category>
                
                <category>History</category>
                
                <category>Games</category>
                
                
            </item>
        
            <item>
                <title>Evaluating AI Systems: From Criteria to Pipelines</title>
                <description>&lt;p&gt;I am reading the book &lt;a href=&quot;https://www.oreilly.com/library/view/ai-engineering/9781098166298/&quot;&gt;AI Engineering by Chip Huyen&lt;/a&gt; for an AI book club at work. 
These notes have been distilled and sanitized for public consumption from Chapter 4 of the book.  AI evaluation is a critical component of AI engineering. 
This chapter mainly covers evaluating AI systems. There are three main components:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Evaluation criteria&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Model selection&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Building out your evaluation pipelines&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three actions are needed to confidently build scalable and resilient AI systems.&lt;/p&gt;

&lt;h1 id=&quot;evaluation-criteria&quot;&gt;Evaluation Criteria&lt;/h1&gt;
&lt;p&gt;Evaluation-driven development is the process of understanding how an application will be evaluated before investing time, money, and resources building something with it.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Evaluation is the biggest bottleneck to AI adoption.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Companies need to start with a set of criteria that is specific to the kind of applications they are trying to develop. An organization might want to choose &lt;em&gt;different&lt;/em&gt; models for different services/components depending on the kinds of things that AI will be doing in production. For example, you may want one model that specializes in providing summarization while another separate model classifies customer responses.&lt;/p&gt;

&lt;p&gt;Multiple-choice question scoring is a common way to evaluate models, but performance can vary with small changes in the way questions are posed, leading to fragile evaluations. In Chapter 5, there’s a deeper discussion on prompt sensitivity. MCQSs also fall short when evaluating generation, such as summarization, translation, and essay writing.&lt;/p&gt;

&lt;h2 id=&quot;generation-capacity&quot;&gt;Generation Capacity&lt;/h2&gt;

&lt;p&gt;OG metrics for models included &lt;strong&gt;fluency&lt;/strong&gt; (eg. grammar, feel) and &lt;strong&gt;coherence&lt;/strong&gt; (eg. logical structure of response). But these two metrics have become insufficient for more modern models, requiring AI engineers to think harder about the ways in which they work with these LLMs. &lt;strong&gt;Natural language generation (NLG)&lt;/strong&gt; metrics have been repurposed to meet the needs of foundation models; nowadays, generated LLM responses are indistinguishable from real human replies which means that fluency &amp;amp; coherence have become less important overall.&lt;/p&gt;

&lt;p&gt;Factual Consistency: how the model fares against context. A response is considered “correct” if the response is supported by the provided context. Local factual consistency specifically evaluates provided context, while global factual consistency evaluates broad knowledge (eg. the sky is blue, not green). Verifying “facts” is the hardest part of factual consistency checking because there is a lot of subjectivity and false information embedded inside training data. Consider what response a foundation model should provide if a user asks “what is the most important meal of the day?” Or, “what is the best way to make a new friend?” There are infinite valid answers to these sorts of questions. Wan et. al. 2024 found that existing models ignore things like references or neutral tone when deciding what kind of training data is accurate, so newer foundation models have gotten better at discerning what’s “fact” and what’s “fiction” or “opinion.”&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;When validating hallucinations, focus on checking for hallucinated niche knowledge, and queries about things that don’t exist.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;More sophisticated LLM-as-a-judges use techniques such as self-verification and knowledge-augmented verification to determine the quality and accuracy of an AI response.&lt;/p&gt;

&lt;p&gt;Search-augmented factuality evaluators can be used to break an output into individual facts and then use a search engine to verify said facts. Textual entailment is then used to determine the relationship between two segments (entailment-hypothesis can be inferred from the premise; contradiction-hypothesis contradicts the premise; neutral-premise neither entails or contradicts the hypothesis). Instead of leveraging more general-purpose AI judges, you can train scorer models to specifically identify factual consistency by leveraging a premises-hypothesis pair as inputs and a predefined entailment as output. DeBERTa-v3-mnli-fever-anli is a 184 million parameter model that can be used for such a task.&lt;/p&gt;

&lt;h2 id=&quot;safety-of-foundation-models&quot;&gt;Safety Of Foundation Models&lt;/h2&gt;

&lt;p&gt;Companies building customer-facing products must also keep safety at the forefront of their evaluations. Responses containing inappropriate language, harmful recommendations, hate speech, violence, or stereotypes are detrimental to the overall user experience and open organizations up to liability inquiries. Safety evaluations should be ongoing and aligned with customer-specific content guidelines.&lt;/p&gt;

&lt;h2 id=&quot;instruction-following-capability&quot;&gt;Instruction Following Capability&lt;/h2&gt;

&lt;p&gt;Some models follow instructions better than others and that dramatically affects the quality of outputs for your application. Poor instruction following can directly degrade customer satisfaction and performance metrics.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;If the model is bad at following instructions, it doesn’t matter how good your instructions are, the outputs will be bad.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Benchmarks such as INFOBench evaluates a model’s ability to follow content constraints, such as discussion restrictions. However, the verification of expanded instruction types, such as linguistic guidelines and style cannot be easily automated (currently). If you instruct a model to use “language appropriate for a young audience,” how do you automatically verify that the output is indeed appropriate? What does “young” even mean? Introducing ambiguity into your requests is a sure-fire way to inject variance in your outputs.&lt;/p&gt;

&lt;p&gt;INFOBench found that GPT-4 is a reasonably reliable and cost-effective evaluator.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;GPT-4 isn’t as accurate as human experts, but it’s more accurate than annotators recruited through Amazon Mechanical Turk.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Roleplaying capability evaluation is also tricky to automate. RoleLLM evaluates a model’s ability to emulate a persona using pre-defined similarity scores and AI judges. This might be something worth investigating. In general though, one should evaluate your roleplaying AI based on style and knowledge, the two key characteristics of these sorts of bots.&lt;/p&gt;

&lt;h2 id=&quot;cost--latency-considerations&quot;&gt;Cost &amp;amp; Latency Considerations&lt;/h2&gt;

&lt;p&gt;AI engineering is a careful balance between model &lt;strong&gt;quality&lt;/strong&gt;, &lt;strong&gt;latency&lt;/strong&gt;, and &lt;strong&gt;cost&lt;/strong&gt;. Most companies will opt for lower quality models that are faster and cheaper. At scale, even minor latency regressions can degrade customer experience. Pareto optimization of foundation models can be done using public model benchmarks and internal evaluation tools, such as LangSmith. Price can be applied to each benchmark to provide a more holistic view of the opportunity costs associated with using one model over another.&lt;/p&gt;

&lt;p&gt;Latency metrics for models include time-to-first-token, time per token, time between tokens, time per query, and more. Essentially, measuring the deltas of tokens and queries provide AI engineers with a latency benchmark that can be extrapolated across larger requests and multiple turns.&lt;/p&gt;

&lt;h1 id=&quot;model-selection&quot;&gt;Model Selection&lt;/h1&gt;

&lt;p&gt;The model selection process can become quite nuanced if you decide to optimize for speed and cost at a per-service basis. When comparing different models, you need to differentiate between &lt;strong&gt;hard attributes&lt;/strong&gt; (what is impossible/impractical for you to change) and &lt;strong&gt;soft attributes&lt;/strong&gt; (what are you able to change). Typically, hard attributes are business requirements while soft attributes comprise metrics like accuracy, toxicity, and factual consistency, i.e. things that can be massaged through prompt engineering.&lt;/p&gt;

&lt;p&gt;A high-level evaluation workflow looks like this:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Filter out models whose hard attributes conflict with your desired application.&lt;/li&gt;
  &lt;li&gt;Use benchmarks to narrow down a model based on accuracy, cost, and latency.
    &lt;ul&gt;
      &lt;li&gt;There are also considerations surrounding things like data security and open-access.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Run experiments using your own internal evals to confirm shortlist any viable models.&lt;/li&gt;
  &lt;li&gt;Once selected, continually monitor your selected model with evals and human verification.
    &lt;ul&gt;
      &lt;li&gt;You can compare internal evals to customer CSAT scores to validate production models.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;model-build-versus-model-buy&quot;&gt;Model Build Versus Model Buy&lt;/h2&gt;

&lt;p&gt;There will always be a performance and accuracy gap between commercial models and open-source models because there aren’t enough financial incentives to release highly-performant models for free. In the real world, organizations typically &lt;em&gt;open-source their weaker models and sell their stronger models.&lt;/em&gt; This creates a gap in the ecosystem of available models; open-source models may be perfectly performant for certain applications, but in general, commercially available models have continued to out-pace what’s available for free.&lt;/p&gt;

&lt;p&gt;There’s a cost-benefit analysis that needs to happen when commercial model usage grows. There is substantial effort and capital needed to build, maintain, and serve your own internal models which means the typical cost needed to invest in internal models is quite high. One benefit to maintaining your own models is that your business has full control over the model’s training data (potentially leading to more niche models for highly specialized fields) and the outputs (allowing organizations to micro-manage the kinds of outputs their models produce).&lt;/p&gt;

&lt;h2 id=&quot;functionality-of-internal-models&quot;&gt;Functionality Of Internal Models&lt;/h2&gt;

&lt;p&gt;One major benefit to building your own models is specializing the model to your niche requests. An organization can fine-tune scalability, function use, structured outputs, and guardrails to meet their personalized needs. While this may be overkill for most organizations, it’s important to understand the trade-offs between model control and external provider dependency. If a third-party removes key functionality, customers must react, whereas those sorts of pivots aren’t a concern with in-house models.&lt;/p&gt;

&lt;h2 id=&quot;benchmarks--data-contamination&quot;&gt;Benchmarks &amp;amp; Data Contamination&lt;/h2&gt;

&lt;p&gt;One concern that has cropped up recently is &lt;em&gt;the saturation of publicly available benchmarks&lt;/em&gt;, so much so that providers like Hugging Face have had to update their benchmarks with fresh examples &amp;amp; evals, and more complex asks. This isn’t the first time that they’ve done this, and they’re set to do it again once the current generation of foundation models saturate the leaderboards with near-identical results. Essentially, public benchmarks need to remain nimble and agile to ensure that their data remains coherent and accurate. The older a leaderboard is, the less valuable its results are in evaluating present-day models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data contamination&lt;/strong&gt; typically happens indirectly in public benchmarks. One example would be using math textbooks to train your model while someone else uses that same textbook to create evals. The benchmark result for this hypothetical model would be inaccurate because we’d be inadvertently using the same training data to evaluate the model. A few ways to deal with data contamination include n-gram overlapping (filtering out sequences of matching tokens in an evaluation sample if it matches what was seen in the training data) and perplexity (low perplexity scores mean that the model has likely seen the data before). Note that n-gram overlapping is more accurate but it quite time-consuming and expensive since you’re comparing n-token string subsets between a large training set and a (potentially large) example evaluation set.&lt;/p&gt;

&lt;h1 id=&quot;designing-an-evaluation-pipeline&quot;&gt;Designing An Evaluation Pipeline&lt;/h1&gt;

&lt;p&gt;There are three main steps outlined for designing an evaluation pipeline:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evaluate all components in your desired system&lt;/strong&gt; to determine the necessary attributes of your AI models. Whether your evaluation is per-task, per-turn or per-intermediate-output, you need to identify the evaluation framework that you’ll use beforehand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create evaluation guidelines.&lt;/strong&gt; This is the most important step in the pipeline. You must define both what the application &lt;em&gt;should do&lt;/em&gt; and what the application &lt;em&gt;shouldn’t do&lt;/em&gt;. The more explicit your guidelines, the more accurate your evaluation will be. There is high variability when guidelines are vague or subjective, so be crystal clear when you build up your evals. Try to tie evaluation metrics to your business metrics. Moreover, try to include examples wherever possible to allow the LLM-as-a-judge to leverage baseline behaviour.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Define evaluation methods and data.&lt;/strong&gt; When selecting an evaluation method, try and tie specialized judges to matching functionality, like using a toxicity classifier to evaluate a bot whose purpose is to deal with hostile customers. When logprobs are available, use them as they are a great metric for determining a model’s confidence towards a generated token.
Like a snake eating its own tail, you should also strive to evaluate your own eval pipelines. Ask yourself “is our eval pipeline getting the right signals,” “how reliable is my pipeline overall,” or “how correlated are my metrics” to form a more all-encompassing understanding of your pipeline. You should strive to regularly revisit and iterate on your evals; evals are active components of the product and should be treated as such.&lt;/p&gt;

&lt;h1 id=&quot;final-thoughts-on-ai-evaluation&quot;&gt;Final Thoughts on AI Evaluation&lt;/h1&gt;

&lt;p&gt;Evaluating AI systems isn’t just about metrics — it’s about aligning models with business goals, ensuring user safety, and iterating constantly. Whether you’re using off-the-shelf models or building your own, a strong evaluation pipeline is the backbone of reliable AI systems. This book has been eye-opening and I strongly encourage you, dear reader, to read through it if you build software using AI models.&lt;/p&gt;
</description>
                <pubDate>Thu, 26 Jun 2025 00:00:00 +0000</pubDate>
                <link>https://bassi.li/articles/evaluating-ai-models</link>
                <guid isPermaLink="true">https://bassi.li/articles/evaluating-ai-models</guid>
                
                <category>AI</category>
                
                <category>Evals</category>
                
                <category>LLM</category>
                
                <category>Development</category>
                
                
            </item>
        
            <item>
                <title>Are We Sacrificing Developer Skills for AI Convenience?</title>
                <description>&lt;p&gt;Last week, I had the pleasure of attending All Things Open in Raleigh, North Carolina. Now, AI may not have been the explicit focus of the conference, but let’s just say it completely stole the show. It felt like every other session was talking about it. &lt;a href=&quot;https://x.com/cmcluck&quot;&gt;Craig McLuckie&lt;/a&gt; from Stacklok covered considerations for securing AI-generated code; &lt;a href=&quot;https://x.com/chrisraygill&quot;&gt;Chris Gill&lt;/a&gt; introduced Firebase Genkit, a framework for building AI-powered applications using RAG (which I learned does not, in fact, refer to a cleaning cloth); and my friend even attended a talk with &lt;a href=&quot;https://x.com/rishabincloud&quot;&gt;Rishab Kumar&lt;/a&gt; on LangChain. All told, &lt;a href=&quot;https://2024.allthingsopen.org/schedule&quot;&gt;there were over 40 talks about AI and its implications at this year’s conference&lt;/a&gt;, and by the end, I felt like I’d attended an AI-themed family reunion.&lt;/p&gt;

&lt;p&gt;Like most developers, I’ve been leaning on GitHub Copilot in VSCode. It’s become my sidekick for writing documentation, generating tests, and fixing logical bugs faster than I can write a for-loop. And I’ve gotten pretty comfortable with its assistance—so much so that I’ve perfected what I call the “Copilot Pause.” What’s the “Copilot Pause,” you ask? It’s the art of typing a few characters and then freezing, like a deer in headlights, waiting for Copilot’s auto-suggestion to pop up and finish my thoughts for me. Time it right, and you’re practically writing code with a trusty AI squire at your side, coding together as the sun sets romantically on another productive day.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“[By] no longer using Copilot, I’m taking my coding skills back.” —Dreams of Code&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But as I nestled deeper into my AI-boosted workflow, I couldn’t ignore a gnawing feeling. Sure, Copilot has transformed my productivity, but I wondered what I might be sacrificing for this newfound convenience. On one shoulder, Copilot whispers suggestions like a guardian angel of efficiency, nudging me forward with pre-generated snippets and ready-made fixes. On the other shoulder, I feel a different presence: the looming worry that my skills might start slipping away, like sand through an hourglass. With Copilot, I’m no longer wrestling with code the way I used to. I’m just validating its output, cruising through the “how” and skimming right over the “why.” It’s a bit like assembling IKEA furniture without the instructions—sure, the end product might look alright, but do I really understand how that drawer slider works?&lt;/p&gt;

&lt;p&gt;These days, a stubborn Kubernetes networking issue no longer drags me down documentation rabbit holes. Instead, I pass Copilot a helm chart and watch it handle things. It’s efficient, but I’m realizing it’s making me a slightly more passive participant in my own projects.&lt;/p&gt;

&lt;p&gt;At the conference, people were hyped about how these tools make it easier for junior developers to contribute, allowing them to join in before mastering every detail. But here’s the rub: if developers become too reliant on AI, they might miss out on those valuable “figuring it out from scratch” experiences. There’s something irreplaceable about facing down error messages and untangling logic knots. As painful as those hours are, they shape a stronger developer—and skipping those steps might leave us with devs who can contribute to a project but lack a real understanding of what’s happening under the hood.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;“As long as job interviews require you to code on a whiteboard or in basic online text editors, you need to remember your basics.” —Vincent Stollenwerk&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, here I am, at a crossroads. AI tools are here to stay, and they’re undeniably powerful. But I worry that my skills—both as a developer and a writer—might start slipping as I let Copilot take more control. And for junior developers, I fear they’ll bridge knowledge gaps with AI rather than with curiosity and good old-fashioned persistence. We could end up with developers who can add to a project but don’t have a clue how it all fits together. And that, to me, is both remarkable and a little terrifying.&lt;/p&gt;

&lt;p&gt;For now, I’ve disabled Copilot auto-complete in VSCode. Instead, I’m using the Copilot chat window when I really need it, like to generate documentation or scaffold a test. It’s definitely clunkier than auto-complete, but that friction feels like it’s keeping my skills sharper. Disabling Copilot was a shock to the system at first—I’d catch myself pausing, waiting for suggestions, only to see my blinking cursor and a blank screen staring back at me. That was a wake-up call. Maybe someday I’ll turn the auto-complete back on, but for now, I’m more interested in preserving my skills than just speeding up my output.&lt;/p&gt;

&lt;p&gt;In writing this, I came across a few developers echoing the same sentiment.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://youtu.be/Wap2tkgaT1Q?si=yc8zkebTFgKuH2GW&quot;&gt;Why I’m no longer using Copilot by Dreams of Code&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://anyblockers.com/posts/avoid-the-copilot-pause&quot;&gt;Avoid the copilot pause by Eric Zakariasson&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://vstollen.me/notes/copilot-pause&quot;&gt;The Copilot Pause by Vincent Stollenwerk&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=GkmUwDXvWiQ&quot;&gt;Why I Quit Copilot by ThePrimeagen&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’m happy to know that I’m not the only one who’s been concerned about this.&lt;/p&gt;
</description>
                <pubDate>Sat, 02 Nov 2024 00:00:00 +0000</pubDate>
                <link>https://bassi.li/articles/developer-skills-and-ai-convenience</link>
                <guid isPermaLink="true">https://bassi.li/articles/developer-skills-and-ai-convenience</guid>
                
                <category>Development</category>
                
                <category>AI</category>
                
                <category>Career</category>
                
                
            </item>
        
    </channel>
</rss>