Geoffrey Litthttp://geoffreylitt.com/2023-12-17T17:01:00+00:00Geoffrey Litt7 books that stood the test of time in 2023/2023/12/17/seven-books-that-stuck-with-me-in-2023.html2023-12-17T17:01:00+00:002023-12-17T17:01:00+00:00Geoffrey Litt<p>It’s the most wonderful time of the year: when people proudly announce how many books they have read in the past 12 months. 10 books, 20 books, 57 books! Worry not—I know you don’t care, and besides, I have no idea how many books I read this year.</p><p>It’s the most wonderful time of the year: when people proudly announce how many books they have read in the past 12 months. 10 books, 20 books, 57 books! Worry not—I know you don’t care, and besides, I have no idea how many books I read this year.</p>
<p>In lieu of that, here’s a short list of <strong>some favorite books I read <em>before</em> 2023 that have stuck with me this year</strong> and changed the way I think. Seven masterpieces on AI, cooking, art, houses, product design, computational media, and trees:</p>
<figure style="margin: 0;">
<img src="/images/article_images/seven-books.jpg" alt="Six books on a floor, corresponding to the list below in this post">
<figcaption>Six of the seven books. The seventh I only have on Kindle, sorry Ken!</figcaption>
</figure>
<h3 id="the-most-human-human-by-brian-christian">The Most Human Human, by Brian Christian</h3>
<p>A book about humanity, disguised as a book about AI. It taught me how to have deeper conversations and find more meaning in my work. Amid a sea of spilled ink on AI, Brian Christian has simply asked more interesting questions. Notably, this book was written in 2011, before the current wave—yet it’s still remarkably relevant.</p>
<p><a href="https://www.goodreads.com/en/book/show/8884400">See it on Goodreads</a></p>
<h3 id="an-everlasting-meal-by-tamar-adler">An Everlasting Meal, by Tamar Adler</h3>
<p>This book changed the way I cook. It teaches the correct way to think about home cooking – not as a chore, an “obstacle”, or an optimized process… but as a simple, natural act of creativity. One of the wisest books I know.</p>
<p><a href="https://www.goodreads.com/en/book/show/11300085">See it on Goodreads</a></p>
<h3 id="art-fear-by-david-bayles-and-ted-orland">Art & Fear, by David Bayles and Ted Orland</h3>
<p>A slim little manual about how to overcome the fear and keep creating. Subtle tips on the role of talent, managing the vision-execution gap, quantity vs quality. I might not have kept going with research if I hadn’t read this book.</p>
<p><a href="https://www.goodreads.com/en/book/show/187633">See it on Goodreads</a></p>
<h3 id="the-production-of-houses-by-christopher-alexander-et-al">The Production of Houses, by Christopher Alexander et al.</h3>
<p>Christopher Alexander thought people could design their own homes. His most famous books, The Timeless Way of Building and A Pattern Language, are brilliant but can be a bit abstract. The Production of Houses shows what actually happened, concretely, when he and his team helped some people do the thing and design their own homes.</p>
<p>The result: some great successes, some strange contradictions to ponder.</p>
<p><a href="https://www.goodreads.com/book/show/106725.The_Production_of_Houses">See it on Goodreads</a></p>
<h3 id="creative-selection-by-ken-kocienda">Creative Selection, by Ken Kocienda</h3>
<p>This book shows that most product design is a dead end. It describes, in great detail, the Apple way—hard to achieve, but worth striving towards. I’m constantly remembering stories from this book in my own work. “Pick one keyboard!”</p>
<p><a href="https://www.goodreads.com/book/show/37638098-creative-selection">See it on Goodreads</a></p>
<h3 id="changing-minds-by-andy-disessa">Changing Minds, by Andy diSessa</h3>
<p>A foundational text for my research. I am always amazed how many people have not even heard of it. If you care about “future of computing”, Bret Victor’s work, “computational literacy”… go read this book! I promise it will change your mind. I reference diSessa’s <a href="https://twitter.com/geoffreylitt/status/1153373693713817600">“nightmare bicycle</a> concept all the time.</p>
<p><a href="https://www.goodreads.com/book/show/1752380">See it on Goodreads</a></p>
<h3 id="the-overstory-by-richard-powers">The Overstory, by Richard Powers</h3>
<p>To the extent that it’s possible to see the world from the perspective of trees, this novel got me to that place. Every time I’m in a forest now, I think about the trees: how long they’ve been there, what they’re communicating to one another.</p>
<p><a href="https://www.goodreads.com/en/book/show/40180098">See it on Goodreads</a></p>
<hr>
<p>Look, I could write so much more about any one of these books (and I’m happy to answer any questions!) but honestly, it feels hard to do them justice.</p>
<p>They’re all 5 stars, on both substance and prose. Well worth your time, and could be a great gift to the right person. I hope you have a great holidays!</p>
Codifying a ChatGPT workflow into a malleable GUI/2023/07/25/building-personal-tools-on-the-fly-with-llms.html2023-07-25T17:15:00+00:002023-07-25T17:15:00+00:00Geoffrey Litt<p>In my previous post, <a href="/2023/03/25/llm-end-user-programming.html">Malleable software in the age of LLMs</a>, I laid out a theory for how LLMs might enable a new era of people creating their own personal software:</p>
<blockquote>
<p>I think it’s likely that soon all computer users will have the ability to develop...</p>
</blockquote><p>In my previous post, <a href="/2023/03/25/llm-end-user-programming.html">Malleable software in the age of LLMs</a>, I laid out a theory for how LLMs might enable a new era of people creating their own personal software:</p>
<blockquote>
<p>I think it’s likely that soon all computer users will have the ability to develop small software tools from scratch, and to describe modifications they’d like made to software they’re already using.</p>
<p>In other words, LLMs will represent a step change in tool support for end-user programming: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.</p>
</blockquote>
<p>Today I’ll <strong>share a real example where I found it useful to build custom personal software with an LLM</strong>. Earlier this week, I used GPT-4 to code an app that helps me draft text messages in English and translate them to Japanese. The basic idea: I paste in the context for the text thread and write my response in English; I get back a translation into Japanese. The app has a couple other neat features, too: I can drag a slider to tweak the formality of the language, and I can highlight any phrase to get a more detailed explanation.</p>
<p>The whole thing is ugly and thrown together in no time, but it has exactly the features I need, and I’ve found it quite useful for planning an upcoming trip to Japan.</p>
<p><img src="/images/article_images/texting-app-teaser.png" alt="" /></p>
<p>The app uses the GPT-4 API to do the actual translations. So there are two usages of LLMs going on here: I used an LLM to code the app, and then the app also uses an LLM when it runs to do the translations. Sorry if that’s confusing, 2023 is weird.</p>
<p>You may ask: why bother making an app for this? Why not just ask ChatGPT to do the translations? I’m glad you asked—that’s what this post is all about! In fact, I started out doing these translations in ChatGPT, but <strong>I ended up finding this GUI nicer to use than raw ChatGPT for several reasons</strong>:</p>
<ul>
<li>It encodes a prescriptive workflow so I don’t need to fuss with prompts as much.</li>
<li>It offers convenient direct manipulation affordances like text boxes and sliders.</li>
<li>It makes it easier to share a workflow with other people.</li>
</ul>
<p>(Interestingly, these are similar to the reasons that so many startups are building products wrapping LLM prompts—the difference here is that I’m just building the tool for myself, and not trying to make a product.)</p>
<p>A key point is that making this personal GUI is only worth it because <strong>GPT also lowers the cost of making and iterating on the GUI!</strong> Even though I’m a programmer, I wouldn’t have made this tool without LLM support. It’s not only the time savings, it’s also the fact that I don’t need to turn on my “programmer brain” to make these tools; I can think at a higher level and let the LLM handle the details.</p>
<p>There are also tradeoffs to consider when moving from ChatGPT into a GUI tool: the resulting workflow is more rigid and less open-ended than a ChatGPT session. In a sense this is the whole point of a GUI. But the GUI isn’t necessarily as limiting as it might seem, because remember, it’s <em>malleable</em>—I built it myself using GPT and can quickly make further edits. This is a very different situation that using a fixed app that someone else made! Below I’ll share one example of how I edited this tool on the fly as I was using it.</p>
<p>Overall I think this experience suggests an intriguing workflow of <strong>codifying a ChatGPT workflow into a malleable GUI</strong>: starting out with ChatGPT, exploring the most useful way to solve a task, and then once you’ve landed on a good approach, codifying that approach in a GUI tool that you can use in a repeatable way going forward.</p>
<p>Alright, on to the story of how this app came about.</p>
<hr>
<h2 id="chatgpt-is-a-good-translator-usually">ChatGPT is a good translator (usually 🙃)</h2>
<p>I’m going on a trip to Japan soon and have been on some text threads where I need to communicate in Japanese. I grew up in Japan but my writing is rusty and painfully slow these days. One particular challenge for me is using the appropriate level of formality with extended family and other family acquaintances—I have fluent schoolyard Japanese but the nuances of formal grown-up Japanese can be tricky.</p>
<p>I started using ChatGPT to make this process faster by asking it to produce draft messages in Japanese based on my English input. I quickly realized <strong>there are some neat benefits to ChatGPT vs. a traditional translation app</strong>. I can give it the full context of the text thread so it can incorporate that into its translation. I can steer it with prompting: asking it to tweak the formality or do a less word-for-word translation. I can ask follow-up questions about the meaning of a word. These capabilities were all gamechangers for this task; they really show why smart chatbots can be so useful!</p>
<p>You may be wondering: how good were the translations? I’d say: good enough to be spectacularly useful to me, <em>given that I can verify and edit</em>. Often they were basically perfect. Sometimes they were wrong in huge, hilarious ways—flipping the meaning of a sentence, or swapping the name of a train station for another one (sigh, LLMs…).</p>
<p>In practice these mistakes didn’t matter too much though. I’m slow at writing in Japanese but can read basic messages easily, so I just fix the errors and they aren’t dealbreakers. <strong>When creation is slow and verification is fast, it’s a sweet spot for using an LLM.</strong></p>
<h2 id="honing-the-workflow">Honing the workflow</h2>
<p>As I translated more messages and saw ways that the model failed, I developed some little prompting tricks that seemed to produce better translations. Things like this:</p>
<blockquote>
<p>Below is some context for a text message thread:</p>
<p>…paste thread…</p>
<p>Now translate my message below to japanese. make it sound natural in the flow of this conversation. don’t translate word for word, translate the general meaning.</p>
<p>…write message…</p>
</blockquote>
<p>I also learned some typical follow-up requests I would often make after receiving the initial translation: things like asking to adjust the formality level up or down.</p>
<p>Once I had landed on these specific prompt patterns, it made my interactions more scripted. Each time I would need to dig up my prompt text for this task, copy-paste it in, and fill in the blanks for this particular translation. When asking follow-up questions I’d also copy-paste phrasings from previous chats that had proven successful. <strong>At this point it didn’t feel like an open-ended conversation anymore; it felt like I was tediously executing a workflow made up of specific chat prompts.</strong></p>
<p>I also found myself wanting to have more of a feeling of a solid tool that I could return to. ChatGPT chats feels a bit amorphous and hard to return to: where do I store my prompts? How do I even remember what useful workflows I’ve come up with? I basically wanted a window I could pop open and get a quick translation.</p>
<h2 id="making-a-gui-with-gpt">Making a GUI with GPT</h2>
<p>So, I asked GPT-4 to build me a GUI codifying this workflow. The app is a frontend-only React.js web app. It’s hosted on <a href="https://replit.com/">Replit</a>, which makes it easy to spin up a new project in one click and then share a link with people. (You can see the current code <a href="https://replit.com/@GeoffreyLitt/TextMessageTranslator#src/App.jsx">here</a> if you’re curious.) I just copy-pasted the GPT-generated code into Replit.</p>
<p><img src="/images/article_images/texting-replit.png" alt="" /></p>
<p>The initial version of the app was very simple: it basically just accepted a text input and then made a request to the GPT-4 API asking for a natural-sounding translation. The early designs generated by ChatGPT were super primitive:</p>
<p><img src="/images/article_images/early-designs.png" alt="" /></p>
<p>Asking it for a “professional and modern” redesign helped get the design looking passable. I then asked GPT to add a <em>formality slider</em> to the app. The new app requests three translations of varying formality, and then lets the user drag a slider to instantly choose between them 😎</p>
<video autoplay loop controls="controls" preload="auto" muted="muted" data-video="0" type="video/mp4" src="/images/article_images/text-app.mp4" width="100%"></video>
<p>GPT-4 did most of the coding of the UI. I didn’t measure how long it took, but subjectively, the whole thing felt pretty effortless; <strong>it felt more like asking a friend to build an app for me than building it myself</strong>, and I never engaged my detailed programmer brain. I still haven’t looked very closely at the code. GPT generally produced good results on every iteration. At one point it got confused about how to call the OpenAI API, but pasting in some recent documentation got it sorted out. I’ve included some of the coding prompts I used at the <a href="#appendix">bottom of this post</a> if you’re curious about the details.</p>
<p>At the same time, it’s important to note that <strong>my programming background did substantially help the process along</strong> and I don’t think it would have gone that well if I didn’t know how to make React UIs. I was able to give the LLM a detailed spec, which was natural for me to write. For example: I suggested storing the OpenAI key as a user-provided setting in the app UI rather than putting it in the code, because that would let us keep the app frontend-only. I also helped fix some minor bugs.</p>
<p>I do believe it’s possible to get to the point where an LLM can support non-programmers in building custom GUIs (and that’s in fact one of my main research goals at the moment). But it’s a much harder goal than supporting programmers, and will require a lot more work on tooling. More on this later.</p>
<h2 id="iterating-on-the-fly">Iterating on the fly</h2>
<p>A few times I noticed that the Japanese translations included phrases I didn’t understand. Once this need came up a few times, I decided to add it as a feature in my GUI. <strong>I asked GPT to modify the code so that I can select a phrase and click a button to get an explanation in context:</strong></p>
<p><img src="/images/article_images/explain-phrase.png" alt="" /></p>
<p>This tight iteration loop felt awesome. Going from wanting the feature to having it in my app was accomplished in minutes with very little effort. This shows the benefit of having a <em>malleable GUI</em> which I control and I can quickly edit using an LLM. My feature requests aren’t trapped in a feedback queue, I can just build them for myself. It’s not the best-designed interaction ever, but it gets the job done.</p>
<p>I’ve found that having the button there encourages me to ask for explanations more often. Before, when I was doing the translations in ChatGPT, I would need to explicitly think to write a follow-up message asking for an explanation. Now I have a button reminding me to do it, and the button also uses a high-quality prompt that I’ve developed.</p>
<h2 id="sharing-the-tool">Sharing the tool</h2>
<p>My brother asked me to try the tool. I sent him the Replit link and he was able to use it.</p>
<p>I think sharing a GUI is probably way more effective than trying to share a complex ChatGPT workflow with various prompts patched together. The UI encodes what I’ve learned about doing this particular task effectively, and provides clear affordances that anyone can pick up quickly.</p>
<h2 id="from-chatbot-to-gui">From chatbot to GUI</h2>
<p>What general lessons can we take away from my experience here? I think it gestures at two big ideas.</p>
<p>The first one is that <strong>chatbots are not always the best interface for a task</strong>, even one like translation that involves lots of natural language and text. Amelia Wattenberger wrote a <a href="https://wattenberger.com/thoughts/boo-chatbots">great piece</a> explaining some of the reasons. It’s worth reading the whole thing, but here’s a key excerpt about the value of affordances:</p>
<blockquote>
<p>Good tools make it clear how they should be used. And more importantly, how they should not be used. If we think about a good pair of gloves, it’s immediately obvious how we should use them. They’re hand-shaped! We put them on our hands. And the specific material tells us more: metal mesh gloves are for preventing physical harm, rubber gloves are for preventing chemical harm, and leather gloves are for looking cool on a motorcycle.</p>
<p>Compare that to looking at a typical chat interface. The only clue we receive is that we should type characters into the textbox. The interface looks the same as a Google search box, a login form, and a credit card field.</p>
</blockquote>
<p>This principle clearly holds when designing a product that other people are going to use. But perhaps surprisingly, in my experience, <strong>affordances are actually useful even when designing a tool for myself!</strong> Good affordances can help my future self remember how to use the tool. The “explain phrase” button reminds me that I should ask about words I don’t know.</p>
<p>I also find that making a UI makes a tool more memorable. My custom GUI is a visually distinctive artifact that lives at a URL; this helps me remember that I have the tool and can use it. Having a UI makes my tool feel more like a reusable artifact than a ChatGPT prompt.</p>
<p>Now, it’s not quite as simple as “GUI good, chatbot bad"—there are tradeoffs. For my translation use case, I found ChatGPT super helpful for my initial explorations. The open-endedness of the chatbot gave it a huge leg up over Google Translate, a more traditional application with more limited capabilities and clearer affordances. I was able to explore a wide space of useful features and find the ones that I wanted to keep using.</p>
<p>I think this suggests a natural workflow: <strong>start in chat, and then codify a UI if it’s getting annoying doing the same chat workflow repeatedly.</strong></p>
<p>By the way, one more thing: there are obviously many other visual affordances to consider besides the ones I used in this particular example. For example, here’s another example of a GPT-powered GUI tool I built a couple months ago, where I can drag-and-drop in a file and see useful conversions of that file into different formats:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wanted to convert a JSON file of a chat transcript into nice markdown text for sharing w/ people…<br><br>so I had GPT generate an ephemeral React UI where I can drag in the JSON file and it outputs the markdown🤓<br><br>reflections on the process: <a href="https://t.co/WGwBBtEGiT">pic.twitter.com/WGwBBtEGiT</a></p>— Geoffrey Litt (@geoffreylitt) <a href="https://twitter.com/geoffreylitt/status/1654246096212992004?ref_src=twsrc%5Etfw">May 4, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<h2 id="the-joy-of-editing-our-tools">The joy of editing our tools</h2>
<p>Another takeaway: <strong>it feels great to use a tiny GUI made just for my own needs</strong>. It does only what I want it to do, nothing more. The design isn’t going to win any awards or get VC funding, but it’s good enough for what I want. When I come across more things that the app needs to do, I can add them.</p>
<p>Robin Sloan has this delightful idea that <a href="https://www.robinsloan.com/notes/home-cooked-app/">an app can be a home-cooked meal</a>:</p>
<blockquote>
<p>When you liberate programming from the requirement to be professional and scalable, it becomes a different activity altogether, just as cooking at home is really nothing like cooking in a commercial kitchen. I can report to you: not only is this different activity rewarding in almost exactly the same way that cooking for someone you love is rewarding, there’s another feeling, too, specific to this realm. I have struggled to find words for this, but/and I think it might be the crux of the whole thing:</p>
<p>This messaging app I built for, and with, my family, it won’t change unless we want it to change. There will be no sudden redesign, no flood of ads, no pivot to chase a userbase inscrutable to us. It might go away at some point, but that will be our decision. What is this feeling? Independence? Security? Sovereignty?</p>
<p>Is it simply … the feeling of being home?</p>
</blockquote>
<p>Software doesn’t always need to be mass-produced like restaurant food, it can be produced intimately at small scale. My translator app feels this way to me.</p>
<p>In this example, using GPT-4 to code and edit the app is what enabled the feeling of malleability for me. It feels magical describing an app and having it appear on-screen within seconds. Little React apps seem to be the kind of simple code that GPT-4 is good at producing. You could even argue that it’s "just regurgitating other code it’s already seen”, but I don’t care—it made me the tool that I wanted.</p>
<p>I’m a programmer and I could have built this app manually myself without too much trouble. And yet, I don’t think I would have. The LLM is an order of magnitude faster than me at getting the first draft out and producing new iterations, this makes me much more likely to just give it a shot. This reminds me of how Simon Willison says that <a href="https://simonwillison.net/2023/Mar/27/ai-enhanced-development/">AI-enhanced development makes him more ambitious with his projects</a>:</p>
<blockquote>
<p>In the past I’ve had plenty of ideas for projects which I’ve ruled out because they would take a day—or days—of work to get to a point where they’re useful. I have enough other stuff to build already!</p>
<p>But if ChatGPT can drop that down to an hour or less, those projects can suddenly become viable.</p>
<p>Which means I’m building all sorts of weird and interesting little things that previously I wouldn’t have invested the time in.</p>
</blockquote>
<p>Simon’s description applies perfectly to my example.</p>
<p>It’s not just about the initial creation, it’s also about the fast iteration loop. I discussed the possibility of LLMs updating a GUI app in my <a href="/2023/03/25/llm-end-user-programming.html">previous post</a>:</p>
<blockquote>
<p>Next, consider LLMs applied to the app model. <strong>What if we started with an interactive analytics application, but this time we had a team of LLM developers at our disposal?</strong> As a start, we could ask the LLM questions about how to use the application, which could be easier than reading documentation.</p>
<p>But more profoundly than that, the LLM developers could go beyond that and <em>update</em> the application. When we give feedback about adding a new feature, our request wouldn’t get lost in an infinite queue. They would respond immediately, and we’d have some back and forth to get the feature implemented. Of course, the new functionality doesn’t need to be shipped to everyone; it can just be enabled for our team. This is economically viable now because we’re not relying on a centralized team of human developers to make the change.</p>
<p><img src="/images/article_images/llm-eup/llm-app.png" alt="" /></p>
</blockquote>
<p>It simply feels good to be using a GUI app, have an idea for how it could be different, and then have that new version running within seconds.</p>
<p>There’s a caveat worth acknowleding here: the story I shared in this post only worked under specific conditions. The app I made is extremely simple in functionality; a more complex app would be much harder to modify.</p>
<p>And I’m pretty confident that the coding workflow I shared in this post only worked because I’m a programmer. The LLM makes me much, much faster at building these simple kinds of utilities, but my programming knowledge still feels essential to keeping the process running. I’m writing fairly detailed technical specs, I’m making architectural choices, I’m occasionally directly editing the code or fixing a bug. The app is so small and simple that it’s easy for me to keep up with what’s going on.</p>
<p>I yearn for non-programmers to also experience software this way, as a malleable artifact they can change in the natural course of use. LLMs are clearly a big leap forward on this dimension, but there’s also a lot of work ahead. We’ll need to find ways for LLMs to work with non-programmers to specify intent, to help them understand what’s going on, and to fix things when they go wrong.</p>
<p>I’m optimistic that a combination of better tooling and improved models can get us there, at least for simpler use cases like my translator tool. I guess there’s only one way to find out 🤓 (<a href="https://buttondown.email/geoffreylitt">Subscribe to my email newsletter</a> if you want to follow along with my research in this area.)</p>
<hr>
<h2 id="recently">Recently…</h2>
<p>In the past few months I’ve given a couple talks relevant to the themes in this post.</p>
<p>In April I spoke at <a href="https://www.causalislands.com/">Causal Islands</a> about <a href="https://www.inkandswitch.com/potluck/">Potluck</a>, a programmable notes prototype I worked on with Max Schoening, Paul Shen, and Paul Sonnentag at Ink & Switch. In my talk I share a bunch of demos from our published essay, but I also show some newer demos of integrating LLMs to help author spreadsheets. (The embed below will jump you right to the LLM demos)</p>
<iframe width="100%" height="315" src="https://www.youtube.com/embed/bJ3i4K3hefI?start=1359" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<p>Also: a couple weeks ago, I presented my PhD thesis defense at MIT! I gave a talk called Building Personal Software with Reactive Databases. I talk about what makes spreadsheets great, and show a few projects I’ve worked on that aim to make it easier to build software using techniques from spreadsheets and databases.</p>
<iframe width="100%" height="315" src="https://www.youtube.com/embed/CPKsS3SJU4o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<hr>
<h2 id="related-reading">Related reading</h2>
<p>If you’re interested in diving deeper into ways of interacting with LLMs besides chatbots, I strongly recommend the following readings:</p>
<ul>
<li><a href="https://idl.cs.washington.edu/files/2019-AgencyPlusAutomation-PNAS.pdf">Agency plus automation: Designing artificial intelligence into interactive systems</a> by Jeffrey Heer</li>
<li><a href="https://magrawala.substack.com/p/unpredictable-black-boxes-are-terrible">Unpredictable Black Boxes are Terrible Interfaces</a> by Maneesh Agrawala</li>
<li><a href="https://dl.acm.org/doi/10.1145/267505.267514">Direct manipulation vs. interface agents</a>, a 1997 debate between Ben Shneiderman and Pattie Maes</li>
</ul>
<p>And for a more abstract angle on the example in this post, check out my previous post, <a href="/2023/03/25/llm-end-user-programming.html">Malleable software in the age of LLMs</a>!</p>
<hr>
<h2 id="appendix-prompts">Appendix: prompts</h2>
<p>Here are some of the prompts I used to make the translator app.</p>
<p>First, my general system prompt for UI coding:</p>
<blockquote>
<p>You are a helpful AI coding assistant. Make sure to follow the user’s instructions precisely and to the letter. Always reason aloud about your plans before writing the final code.</p>
<p>Write code in ReactJS. Keep the whole app in one file. Only write a frontend, no backend.</p>
<p>If the specification is clear, you can generate code immediately. If there are ambiguities, ask key clarifying questions before proceeding.</p>
<p>When the user asks you to make edits, suggest minimal edits to the code, don’t regenerate the whole file.</p>
</blockquote>
<p>Initial prompt for the texting app:</p>
<blockquote>
<p>I’d like you to make me an app that helps me participate in a text message conversation in Japanese by using an LLM to translate. Here’s the basic idea:</p>
<ul>
<li>I paste in a transcript of a text message thread into a box</li>
<li>I write the message I want to reply with (in english) into a different box</li>
<li>I click a button</li>
<li>the app shows me a Japanese translation of my message as output; there’s a copy button so i can copy-paste it easily.</li>
<li>the app talks to openai gpt-4 to do the translation. the prompt can be something like “here’s a text thread in japanese: <thread>. now translate my new message below to japanese. make it sound natural in the flow of this conversation. don’t translate word for word, translate the general meaning.” use the openai js library, some sample code pasted below.</li>
<li>the user can paste in their openai key in a settings pane, it gets stored in localstorage</li>
</ul>
</blockquote>
<p>One of the iterative edits for the texting app:</p>
<blockquote>
<p>make the following edits and output new code:</p>
<ul>
<li>write a css file and style the app to look professional and modern.</li>
<li>arrange the text thread in a tall box on the left, and then the new message and translation vertically stacked to the right</li>
<li>give the app a title: Japanese Texting Helper</li>
<li>hide the openai key behind a settings section that gets toggled open/closed at the bottom of the app</li>
</ul>
</blockquote>
Malleable software in the age of LLMs/2023/03/25/llm-end-user-programming.html2023-03-25T19:05:00+00:002023-03-25T19:05:00+00:00Geoffrey Litt<p><img src="/images/article_images/llm-eup/robot-coding.png" alt="A robot and a human coding together. Image from Midjourney."></p>
<p>It’s been a wild few weeks for large language models. OpenAI <a href="https://cdn.openai.com/papers/gpt-4.pdf">released GPT-4</a>, which shows impressive gains on a variety of capabilities including coding. Microsoft Research <a href="https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/">released a paper</a> showing how GPT-4 was able to produce quite sophisticated...</p><p><img src="/images/article_images/llm-eup/robot-coding.png" alt="A robot and a human coding together. Image from Midjourney." /></p>
<p>It’s been a wild few weeks for large language models. OpenAI <a href="https://cdn.openai.com/papers/gpt-4.pdf">released GPT-4</a>, which shows impressive gains on a variety of capabilities including coding. Microsoft Research <a href="https://www.microsoft.com/en-us/research/publication/sparks-of-artificial-general-intelligence-early-experiments-with-gpt-4/">released a paper</a> showing how GPT-4 was able to produce quite sophisticated code like a 3D video game without much prompting at all. OpenAI also released <a href="https://openai.com/blog/chatgpt-plugins">plugins for ChatGPT</a>, which are a productized version of the ReAct tool usage pattern I played around with in my <a href="https://www.geoffreylitt.com/2023/01/29/fun-with-compositional-llms-querying-basketball-stats-with-gpt-3-statmuse-langchain.html">previous post about querying NBA statistics using GPT</a>.</p>
<p>Amid all this chaos, many people are naturally wondering: <strong>how will LLMs affect the creation of software?</strong></p>
<p>One answer to that question is that LLMs will make skilled professional developers more productive. This is a safe bet since GitHub Copilot has already shown it’s viable. It’s also a comforting thought, because developers can feel secure in their future job prospects, and it doesn’t suggest structural upheaval in the way software is produced or distributed 😉</p>
<p>However, I suspect this won’t be the whole picture. While I’m confident that LLMs will become useful tools for professional programmers, I also think focusing too much on that narrow use risks missing the potential for bigger changes ahead.</p>
<p>Here’s why: <strong>I think it’s likely that soon all computer users will have the ability to develop small software tools from scratch, and to describe modifications they’d like made to software they’re already using.</strong> In other words, LLMs will represent a step change in tool support for <a href="https://www.inkandswitch.com/end-user-programming/"><em>end-user programming</em></a>: the ability of normal people to fully harness the general power of computers without resorting to the complexity of normal programming. Until now, that vision has been bottlenecked on turning fuzzy informal intent into formal, executable code; now that bottleneck is rapidly opening up thanks to LLMs.</p>
<p>If this hypothesis indeed comes true, we might start to see some surprising changes in the way people use software:</p>
<ul>
<li><strong>One-off scripts</strong>: Normal computer users have their AI create and execute scripts dozens of times a day, to perform tasks like data analysis, video editing, or automating tedious tasks.</li>
<li><strong>One-off GUIs:</strong> People use AI to create entire GUI applications just for performing a single specific task—containing just the features they need, no bloat.</li>
<li><strong>Build don’t buy:</strong> Businesses develop more software in-house that meets their custom needs, rather than buying SaaS off the shelf, since it’s now cheaper to get software tailored to the use case.</li>
<li><strong>Modding/extensions:</strong> Consumers and businesses demand the ability to extend and mod their existing software, since it’s now easier to specify a new feature or a tweak to match a user’s workflow.</li>
<li><strong>Recombination:</strong> Take the best parts of the different applications you like best, and create a new hybrid that composes them together.</li>
</ul>
<p>All of these changes would go beyond just making our current software production process faster. They would be changing when software gets created, by whom, for what purpose.</p>
<h2 id="llms-malleable-software-a-series">LLMs + malleable software: a series</h2>
<p>Phew, there’s a lot to unpack here. 😅</p>
<p>In a series of posts starting with this one, I’ll dig in and explore these kinds of broad changes LLMs might enable in the creation and distribution of software, and even more generally in the way people interact with software. Some of the questions I’ll cover include:</p>
<ul>
<li><strong>Interaction models:</strong> Which interaction model will make sense for which tasks? When will people want a chatbot, a one-off script, or a custom throwaway GUI?</li>
<li><strong>Software customization:</strong> How might LLMs enable <em>malleable software</em> that can be taken apart, recombined, and extended by users?</li>
<li><strong>Intent specification:</strong> How will end-users work interactively with LLMs to specify their intent?</li>
<li><strong>Fuzzy translators:</strong> How might the fuzzy data translation capabilities of LLMs enable shared data substrates which weren’t possible before?</li>
<li><strong>User empowerment:</strong> How should we think about <em>empowerment</em> and <em>agency</em> vs <em>delegation</em> and <em>automation</em> in the age of LLMs?</li>
</ul>
<p>If you want to subscribe to get future posts about these ideas, you can <a href="https://buttondown.email/geoffreylitt">sign up for my email newsletter</a> or <a href="/feed.xml">subscribe via RSS</a>. Posts should be fairly infrequent, monthly at most.</p>
<h2 id="when-to-chatbot-when-to-not">When to chatbot, when to not?</h2>
<p>Today, we’ll start with a basic question: how will user interaction models evolve in the LLM era? In particular, <strong>what kinds of tasks might be taken over by chatbots?</strong> I think the answer matters a lot when we consider different ways to empower end-users.</p>
<p>As a preview of where this post is headed: I’ll argue that, while ChatGPT is far more capable than Siri, there are many tasks which aren’t well-served by a chat UI, for which we still need graphical user interfaces. Then I’ll discuss hybrid interaction models where LLMs help us construct UIs.</p>
<p>By the end, we’ll arrive at a point in the design space I find intriguing: open-ended computational media, directly learnable and moldable by users, with LLMs as collaborators within that media. And at that point this weird diagram will make sense 🙃:</p>
<p><img src="/images/article_images/llm-eup/medium-local-llm-devs.png" alt="" /></p>
<p>One disclaimer before diving in: expect a lot of speculation and uncertainty. I’m not even trying to predict how fast these changes will happen, since I have no idea. The point is to imagine how a reasonable extrapolation from current AI might support new kinds of interactions with computers, and how we might apply this new technology to maximally empower end-users.</p>
<h2 id="opening-up-the-programming-bottleneck">Opening up the programming bottleneck</h2>
<p>Why might LLMs be a big deal for empowering users with computation?</p>
<p>For decades, pioneers of computing have been reaching towards a vision of <em>end-user programming</em>: normal people harnessing the full, general power of computers, not just using prefabricated applications handed down to them by the programmer elite. As Alan Kay <a href="http://worrydream.com/refs/Kay%20-%20Opening%20the%20Hood%20of%20a%20Word%20Processor.pdf">wrote in 1984</a>: “We now want to edit our <em>tools</em> as we have previously edited our documents.”</p>
<p>There are many manifestations of this idea. Modern examples of end-user programming systems you may have used include spreadsheets, Airtable, Glide, or iOS Shortcuts. Older examples include HyperCard, Smalltalk, and Yahoo Pipes. (See this <a href="https://www.inkandswitch.com/end-user-programming/">excellent overview</a> by my collaborators at Ink & Switch for a historical deep dive)</p>
<p>Although some of these efforts have been quite successful, until now they’ve also been limited by a fundamental challenge: <strong>it’s really hard to help people turn their rough ideas into formal executable code.</strong> System designers have tried super-high-level languages, friendly visual editors and better syntax, layered levels of complexity, and automatically generating simple code from examples. But it’s proven hard to get past a certain ceiling of complexity with these techniques.</p>
<p>Here’s one example of the programming bottleneck in my own work. A few years ago, I developed an end-user programming system called <a href="https://www.geoffreylitt.com/wildcard/">Wildcard</a> which would let people customize any website through a spreadsheet interface. For example, in this short demo you can see a user sorting articles on Hacker News in a different order, and then adding read times to the articles in the page, all by manipulating a spreadsheet synced with the webpage.</p>
<p><video src="/images/article_images/llm-eup/wildcard.mp4#t=0.1" controls="controls" preload="auto" muted="muted" data-video="0" /></p>
<p>Neat demo, right?</p>
<p>But if you look closely, there are two slightly awkward programming bottlenecks in this system. First, the user needs to be able to write small spreadsheet formulas to express computations. This is a lot easier than learning a full-fledged programming language, but it’s still a barrier to initial usage. Second, behind the scenes, Wildcard requires site-specific scraping code (excerpt shown below) to connect the spreadsheet to the website. In theory these adapters could be written and maintained by developers and shared among a community of end-users, but that’s a lot of work.</p>
<p><img src="/images/article_images/llm-eup/hacker-news.png" alt="" /></p>
<p><strong>Now, with LLMs, these kinds of programming bottlenecks are less of a limiting factor.</strong> Turning a natural language specification into web scraping code or a little spreadsheet formula is exactly the kind of code synthesis that current LLMs can already achieve. We could imagine having the LLM help with scraping code and generating formulas, making it possible to achieve the demo above without anyone writing manual code. When I made Wildcard, this kind of program synthesis was just a fantasy, and now it’s rapidly becoming a reality.</p>
<p>This example also suggests a deeper question, though. If we have LLMs that can modify a website for us, why bother with the Wildcard UI at all? Couldn’t we just ask ChatGPT to re-sort the website for us and add read times?</p>
<p>I don’t think the answer is that clear cut. There’s a lot of value to seeing the spreadsheet as an alternate view of the underlying data of the website, which we can directly look at and manipulate. Clicking around in a table and sorting by column headers feels good, and is faster than typing “sort by column X”. Having spreadsheet formulas that the user can directly see and edit gives them more control.</p>
<p>The basic point here is that <strong>user interfaces still matter.</strong> We can imagine specific, targeted roles for LLMs that help empower users to customize and build software, without carelessly throwing decades of interaction design out the window.</p>
<p>Next we’ll dive deeper into this question of user interfaces vs. chatbots. But first let’s briefly go on a tangent and ask: can GPT really code?</p>
<h2 id="cmon-can-it-really-code-though">Cmon, can it really code though?</h2>
<p>How good is GPT-4’s coding ability today? It’s hard to summarize in general terms. The best way to understand the current capabilities is to see many positive and negative examples to develop some fuzzy intuition, and ideally to try it yourself.</p>
<p>It’s not hard to find impressive examples. Personally, I’ve had success using GPT-4 to write one-off Python code for data processing, and I watched my wife use ChatGPT to write some Python code for scraping data from a website. A <a href="https://arxiv.org/abs/2303.12712">recent paper</a> from Microsoft Research found GPT-4 could generate a sophisticated 3D game running in the browser, with a zero-shot prompt (shown below).</p>
<p><img src="/images/article_images/llm-eup/3d-game.png" alt="" /></p>
<p>It’s also not hard to find failures. In my experience, GPT-4 still gets confused when solving relatively simple algorithms problems. I tried to use it the other day to make a React application for performing some simple video editing tasks, and it got 90% of the way there but couldn’t get some dragging/resizing interactions quite right. It’s very far from perfect. In general, GPT-4 feels like a junior developer who is very fast at typing and knows about a lot of libraries, but is careless and easily confused.</p>
<p>Depending on your perspective, this summary might seem miraculous or underwhelming. If you’re skeptical, I want to point out a couple reasons for optimism which weren’t immediately obvious to me.</p>
<p>First, <strong>iteration is a natural part of the process with LLMs</strong>. When the code doesn’t work the first time, you can simply paste in the error message you got, or describe the unexpected behavior, and GPT will adjust. For one example, see this <a href="https://twitter.com/ammaar/status/1637592014446551040">Twitter thread</a> where a designer (who can’t write game code) creates a video game over many iterations. There were also some examples of iterating with error messages in the <a href="https://www.youtube.com/watch?v=outcGtbnMuQ">GPT-4 developer livestream</a>. When you think about it, this mirrors the way humans write code; it doesn’t always work on the first try.</p>
<p>A joke that comes up often among AI-skeptical programmers goes something like this: “Great, now no one will have to write code, they’ll only have to write exact, precise specifications of computer behavior…” (implied: oh wait, that is code!) I suspect we’ll look back on this view as short-sighted. LLMs can iteratively work with users and ask them questions to develop their specifications, and can also fill in underspecified details using common sense. This doesn’t mean those are trivial challenges, but I expect to see progress on those fronts. I’ve already had success prompting GPT-4 to ask me clarifying questions about my specifications.</p>
<p>Another important point: <strong>GPT-4 seems to be a <em>lot</em> better than GPT-3 at coding</strong>, per the MSR paper and my own limited experiments. The trend line is steep. If we’re not plateauing yet, then it’s very plausible that the next generation of models will be significantly better once again.</p>
<p>Coding difficulty varies by context, and we might expect to see differences between professional software engineering and end-user programming. On the one hand, one might expect end-user programming to be easier than professional coding, because lots of tasks can be achieved with simple coding that mostly involves gluing together libraries, and doesn’t require novel algorithmic innovation.</p>
<p>On the other hand, <strong>failures are more consequential when a novice end-user is driving the process than when a skilled programmer is wielding control</strong>. The skilled programmer can laugh off the LLM’s silly suggestion, write their own code, or apply their own skill to work with the LLM to debug. An end-user is more likely to get confused or not even notice problems in the first place. These are real problems, but I don’t think they’re intractable. End-users already write messy buggy spreadsheet programs all the time, and yet we somehow muddle through—even if that seems offensive or perhaps even immoral to a correctness-minded professional software developer.</p>
<h2 id="chat-is-an-essentially-limited-interaction">Chat is an essentially limited interaction</h2>
<p>Now, with those preliminaries out of the way, let’s move on to the main topic of this post: how will interaction models evolve in this new age of computing? We’ll start by assessing chat as an interaction mode. Is the future of computing just talking to our computers in natural language?</p>
<p>To think clearly about this question, I think it’s important to notice that chatbots are frustrating for two distinct reasons. First, it’s annoying when the chatbot is narrow in its capabilities (looking at you Siri) and can’t do the thing you want it to do. But more fundamentally than that, <strong>chat is an essentially limited interaction mode, regardless of the quality of the bot.</strong></p>
<p>To show why, let’s pick on a specific example: this tweet from OpenAI’s Greg Brockman during the ChatGPT Plugins launch this week, where he uses ChatGPT to trim the first 5 seconds of a video using natural language:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Plugins for processing a video clip, no ffmpeg wizardry required. Actual use-case from today’s launch. <a href="https://t.co/Q3r2Z8fRS5">pic.twitter.com/Q3r2Z8fRS5</a></p>— Greg Brockman (@gdb) <a href="https://twitter.com/gdb/status/1638971232443076609?ref_src=twsrc%5Etfw">March 23, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>On the one hand, this is an extremely impressive demo for anyone who knows how computers work, and I’m excited about all the possibilities it implies.</p>
<p>And yet… in another sense, <strong>this is also a silly demo, because we already have direct manipulation user interfaces for trimming videos</strong>, with rich interactive feedback. For example, consider the iPhone UI for trimming videos, which offers rich feedback and fine control over exactly where to trim. This is much better than going back and forth over chat saying “actually trim just 4.8 seconds please”!</p>
<p><img src="/images/article_images/llm-eup/iphone-trim.jpeg" alt="" /></p>
<p>Now, I get that the point of Greg’s demo wasn’t just to trim a video, it was to gesture at an expanse of possibilities. But there’s still something important to notice here: a chat interface is not only quite slow and imprecise, but also requires conscious awareness of your thought process.</p>
<p>When we use a good tool—a hammer, a paintbrush, a pair of skis, or a car steering wheel—we become one with the tool in a subconscious way. We can enter a flow state, apply muscle memory, achieve fine control, and maybe even produce creative or artistic output. <strong>Chat will never feel like driving a car, no matter how good the bot is.</strong> In their 1986 book Understanding Computers and Cognition, Terry Winograd and Fernando Flores elaborate on this point:</p>
<blockquote>
<p>In driving a car, the control interaction is normally transparent. You do not think “How far should I turn the steering wheel to go around that curve?” In fact, you are not even aware (unless something intrudes) of using a steering wheel…The long evolution of the design of automobiles has led to this readiness-to-hand. It is not achieved by having a car communicate like a person, but by providing the right coupling between the driver and action in the relevant domain (motion down the road).</p>
</blockquote>
<h2 id="consultants-vs-apps">Consultants vs apps</h2>
<p>Let’s zoom out a bit on this question of chat vs direct manipulation. One way to think about it is to reflect on what it’s like to interact with a team of human consultants over Slack, vs. just using an app to get the job done. Then we’ll see how LLMs might play in to that picture.</p>
<p>So, imagine you want to get some metrics about your business, maybe a sales forecast for next quarter. How do you do it?</p>
<p>One approach is to ask your skilled team of business analysts. You can send them a message asking your question. It probably takes hours to get a response because they’re busy, and it’s expensive because you’re paying for people’s time. Seems like overkill for a simple task, but the key benefit is <em>flexibility</em>: you’re hoping that the consultants have a broad, general intelligence and can perform lots of different tasks that you ask of them.</p>
<p><img src="/images/article_images/llm-eup/consultant.png" alt="" /></p>
<p>In contrast, another option is to use a self-serve analytics platform where you can click around in some dashboards. When this works, it’s way faster and cheaper than bothering the analysts. The dashboards offer you powerful direct manipulation interactions like sorting, filtering, and zooming. You can quickly think through the problem yourself.</p>
<p>So what’s the downside? <strong>Using the app is <em>less flexible</em> than working with the bespoke consultants.</strong> The moment you want to perform a task which this analytics platform doesn’t support, you’re stuck asking for help or switching to a different tool. You can try sending an email to the developers of the analysis platform, but usually nothing will come of it. You don’t have a meaningful feedback loop with the developers; you’re left wishing software were more flexible.</p>
<p><img src="/images/article_images/llm-eup/app.png" alt="" /></p>
<p>Now with that baseline comparison established, let’s imagine how LLMs might fit in.</p>
<p>Assume that we could replace our human analyst team with ChatGPT for the tasks we have in mind, while preserving the same degree of flexibility. (This isn’t true of today’s models, but will become increasingly true to some approximation.) How would that change the picture? Well, for one thing, the LLM is a lot cheaper to run than the humans. It’s also a lot faster at responding since it’s not busy taking a coffee break. These are major advantages. But still, dialogue back and forth with it takes seconds, if not minutes, of conscious thought—much slower than feedback loops you have with a GUI or a steering wheel.</p>
<p><img src="/images/article_images/llm-eup/llm-consultant.png" alt="" /></p>
<p>Next, consider LLMs applied to the app model. <strong>What if we started with an interactive analytics application, but this time we had a team of LLM developers at our disposal?</strong> As a start, we could ask the LLM questions about how to use the application, which could be easier than reading documentation.</p>
<p>But more profoundly than that, the LLM developers could go beyond that and <em>update</em> the application. When we give feedback about adding a new feature, our request wouldn’t get lost in an infinite queue. They would respond immediately, and we’d have some back and forth to get the feature implemented. Of course, the new functionality doesn’t need to be shipped to everyone; it can just be enabled for our team. This is economically viable now because we’re not relying on a centralized team of human developers to make the change.</p>
<p><img src="/images/article_images/llm-eup/llm-app.png" alt="" /></p>
<p>Note that this is just a rough vision at this point. We’re missing a lot of details about how this model might be made real. A lot of the specifics of how software is built today make these kinds of on-the-fly customizations quite challenging.</p>
<p>The important thing, though, is that we’ve now established two loops in the interaction. On the inner loop, we can become one with the tool, using fast direct manipulation interfaces. On the outer loop, when we hit limits of the existing application, we can consciously offer feedback to the LLM developers and get new features built. This preserves the benefits of UIs, while adding more flexibility.</p>
<h2 id="from-apps-to-computational-media">From apps to computational media</h2>
<p>Does this double interaction loop remind you of anything?</p>
<p>Think about how a spreadsheet works. If you have a financial model in a spreadsheet, you can try changing a number in a cell to assess a scenario—this is the inner loop of direct manipulation at work.</p>
<p>But, you can also edit the formulas! <strong>A spreadsheet isn’t just an “app” focused on a specific task; it’s closer to a general computational medium</strong> which lets you flexibly express many kinds of tasks. The “platform developers"—the creators of the spreadsheet—have given you a set of general primitives that can be used to make many tools.</p>
<p>We might draw the double loop of the spreadsheet interaction like this. You can edit numbers in the spreadsheet, but you can also edit formulas, which <em>edits the tool</em>:</p>
<p><img src="/images/article_images/llm-eup/medium.png" alt="" /></p>
<p>So far, I’ve labeled the spreadsheet in the above diagram as "kinda” flexible. Why? Well, when any individual user is working with a spreadsheet, it’s easy for them to hit the limits of their knowledge. In real life, spreadsheets are actually way more flexible than this. The reason is that this diagram is missing a critical component of spreadsheet usage: <em>collaboration</em>.</p>
<h2 id="collaboration-with-local-developers">Collaboration with local developers</h2>
<p>Most teams have a mix of domain experts and technical experts, who work together to put together a spreadsheet. And, importantly, the people building a spreadsheet together have a <em>very different relationship</em> than a typical “developer” and “end-user”. Bonnie Nardi and James Miller explain in their <a href="https://www.lri.fr/~mbl/Stanford/CS477/papers/Nardi-Twinkling-IJMMS.pdf">1990 paper on collaborative spreadsheet development</a>, imagining Betty, a CFO who knows finance, and Buzz, an expert in programming spreadsheets:</p>
<blockquote>
<p>Betty and Buzz seem to be the stereotypical end-user/developer pair, and it is easy to imagine their development of a spreadsheet to be equally stereotypical: Betty specifies what the spreadsheet should do based on her knowledge of the domain, and Buzz implements it.</p>
<p>This is not the case. Their cooperative spreadsheet development departs from this scenario in two important ways:</p>
<p>(1) <strong>Betty constructs her basic spreadsheets without assistance from Buzz.</strong> She programs the parameters, data values and formulas into her models. In addition, Betty is completely responsible for the design and implementation of the user interface. She makes effective use of color, shading, fonts, outlines, and blank cells to structure and highlight the information in her spreadsheets.</p>
<p>(2) When Buzz helps Betty with a complex part of the spreadsheet such as graphing or a complex formula, <strong>his work is expressed in terms of Betty’s original work.</strong> He adds small, more advanced pieces of code to Betty’s basic spreadsheet; Betty is the main developer and he plays an adjunct role as consultant.</p>
<p>This is an important shift in the responsibility of system design and implementation. Non-programmers can be responsible for most of the development of a spreadsheet, implementing large applications that they would not undertake if they had to use conventional programming techniques. Non-programmers may never learn to program recursive functions and nested loops, but they can be extremely productive with spreadsheets. Because less experienced spreadsheet users become engaged and involved with their spreadsheets, they are motivated to reach out to more experienced users when they find themselves approaching the limits of their understanding of, or interest in, more sophisticated programming techniques.</p>
</blockquote>
<p>So, a more accurate diagram of spreadsheet usage includes “local developers” like Buzz, who provide another outer layer of iteration, where the user can get help molding their tools. Because they’re on the same team as the user, it’s a lot easier to get help than appealing to third-party application or platform developers. And most importantly, over time, the user naturally learns to use more features of spreadsheets on their own, since they’re involved in the development process.</p>
<p><img src="/images/article_images/llm-eup/medium-local-devs.png" alt="" /></p>
<p>In general, the local developer makes the spreadsheet more flexible, although they also introduce cost, because now you have a human technical expert in the mix. What if you don’t have a local spreadsheet expert handy, perhaps because you can’t afford to hire that person? Then you’re back to doing web searches for complex spreadsheet programming…</p>
<p>In those cases, <strong>what if you had an LLM play the role of the local developer?</strong> That is, the user mainly drives the creation of the spreadsheet, but asks for technical help with some of the formulas when needed? The LLM wouldn’t just create an entire solution, it would also <em>teach the user</em> how to create the solution themselves next time.</p>
<p><img src="/images/article_images/llm-eup/medium-local-llm-devs.png" alt="" /></p>
<p>This picture shows a world that I find pretty compelling. There’s an inner interaction loop that takes advantage of the full power of direct manipulation. There’s an outer loop where the user can also more deeply edit their tools within an open-ended medium. They can get AI support for making tool edits, and grow their own capacity to work in the medium. Over time, they can learn things like the basics of formulas, or how a <code>VLOOKUP</code> works. This structural knowledge helps the user think of possible use cases for the tool, and also helps them audit the output from the LLMs.</p>
<p>In a ChatGPT world, the user is left entirely dependent on the AI, without any understanding of its inner mechanism. In a computational medium with AI as assistant, the user’s reliance on the AI gently <em>decreases</em> over time as they become more comfortable in the medium.</p>
<p>If you like this diagram too, then it suggests an interesting opportunity. Until now, the design of open-ended computational media has been restricted by the programming bottleneck problem. LLMs seem to offer a promising way to more flexibly turn natural language into code, which then raises the question: <em>what kinds of powerful computational media might be a good fit for this new situation?</em></p>
<h2 id="demos-of-on-the-fly-ui">Demos of on-the-fly UI</h2>
<p><em>Update 3/31: In the days after I originally posted this essay, I found a few neat demos on Twitter from people exploring ideas in this space; I’ve added them here.</em></p>
<p>OK, enough diagrams, what might on-the-fly UI generation actually feel like to use?</p>
<p>Here’s Sean Grove demonstrating on-the-fly generation of an interactive table view, a map view with a lat/long output, and a simple video editing UI:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">🚀Future of UI dev🔮:<br>~10% fixed UIs built by hand like today<br>~40% replaced by conversational UIs<br>~50% long-tail, on-the-fly UIs generated for specific tasks, used once, then vanish<br><br>Combined with ChatGPT plugins to read/write from the world 🤯<a href="https://t.co/mIFrCyzW8N">https://t.co/mIFrCyzW8N</a></p>— Sean Grove (@sgrove) <a href="https://twitter.com/sgrove/status/1640417065650778113?ref_src=twsrc%5Etfw">March 27, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>And here’s Vasek Mlejnsky showing an IDE that can create a form for submitting server requests:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I present to you: <br>GPT-4 powered IDE that creates UI on demand so it fits your exact development needs.<br><br>Need UI for making server requests? No problem. Just ask for it. <a href="https://t.co/2oDKTuWM0e">pic.twitter.com/2oDKTuWM0e</a></p>— Vasek Mlejnsky (@mlejva) <a href="https://twitter.com/mlejva/status/1641151421830529042?ref_src=twsrc%5Etfw">March 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Finally, here’s a little video mockup I made of GPT answering a question by returning an interactive spreadsheet. Note how I can tweak numbers and get immediate feedback. I can also inspect the underlying formulas and ask the model to explain them to me to level up my spreadsheet knowledge. (GPT actually did generate this spreadsheet data, I just copied the raw data into Excel to demonstrate the interactive element.)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">what if a chat produced a spreadsheet as the answer, so you could instantly tweak numbers and see the result? <a href="https://t.co/FNKz0kLH7L">pic.twitter.com/FNKz0kLH7L</a></p>— Geoffrey Litt (@geoffreylitt) <a href="https://twitter.com/geoffreylitt/status/1641134578222891029?ref_src=twsrc%5Etfw">March 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>I think these demos nicely illustrate the general promise of on-the-fly UI, but there’s still a ton of work ahead. One particular challenge: interesting UIs usually can’t be generated in a single shot; there has to be an iterative process with the user. In my experience, that iteration process can still often be very rough at the moment.</p>
<h2 id="next-time-extensible-software">Next time: extensible software</h2>
<p>That’s it for now. There are a lot of questions in the space that we still haven’t covered.</p>
<p>Next time I plan to discuss the architectural foundations required to make GUI applications extensible and composable by people using LLMs.</p>
<p>If you’re interested in that, you can <a href="https://buttondown.email/geoffreylitt">sign up for my email newsletter</a> or <a href="/feed.xml">subscribe via RSS</a>.</p>
<h2 id="related-reading">Related reading</h2>
<p><strong>Quick reads:</strong></p>
<ul>
<li><a href="https://nickarner.com/notes/llm-powered-assistants-for-complex-interfaces-february-26-2023/">LLM Powered Assistants for Complex Interfaces</a> by Nick Arner</li>
<li><a href="https://stream.thesephist.com/updates/1668617521">“The fact that they generate text is not the point”</a> by @thesephist</li>
<li><a href="https://interconnected.org/home/2023/02/07/braggoscope">“GPT-3 as a universal coupling”</a> by Matt Webb</li>
<li><a href="https://www.geoffreylitt.com/2022/11/23/dynamic-documents.html#tools-vs-machines">“tools vs machines”</a> and <a href="https://www.geoffreylitt.com/2022/11/23/dynamic-documents.html#interpreter-vs-compiler">“interpreter vs compiler”</a></li>
</ul>
<p><strong>Deep, deep dives:</strong></p>
<p><a href="https://tcher.tech/publications/PhilipTchernavskij_PhDThesis.pdf">Designing and Programming Malleable Software</a>: Philip Tchernavskij’s 2019 PhD thesis, which coined the term Malleable Software, and brilliantly motivates and defines the problem. “Malleable software aims to increase the power of existing adaptation behaviors by allowing users to pull apart and re-combine their interfaces at the granularity of individual UI elements”</p>
<p><a href="https://web.media.mit.edu/~lieber/Publications/End-User-Software-Engineering.pdf">The State of the Art in End-User Software Engineering</a>: an academic paper from 2011 that illustrates many of the challenges ahead for supporting normal people in building software. “Although these end-user programmers may not have the same goals as professional developers, they do face many of the same software engineering challenges, including understanding their requirements, as well as making decisions about design, reuse, integration, testing, and debugging.”</p>
<p>The <a href="https://malleable.systems/catalog/">Malleable Systems Catalog</a>, a list of projects exploring user-editable software, curated by J. Ryan Stinnett and co.</p>
ChatGPT as muse, not oracle/2023/02/26/llm-as-muse-not-oracle.html2023-02-26T21:47:00+00:002023-02-26T21:47:00+00:00Geoffrey Litt<p>Recently lots of people have been trying very hard to make large language models like ChatGPT into better <em>oracles</em>—when we ask them questions, we want the perfect answer. As an example, in my <a href="/2023/01/29/fun-with-compositional-llms-querying-basketball-stats-with-gpt-3-statmuse-langchain.html">last post</a>, I explored some techniques for helping LLMs answer...</p><p>Recently lots of people have been trying very hard to make large language models like ChatGPT into better <em>oracles</em>—when we ask them questions, we want the perfect answer. As an example, in my <a href="/2023/01/29/fun-with-compositional-llms-querying-basketball-stats-with-gpt-3-statmuse-langchain.html">last post</a>, I explored some techniques for helping LLMs answer complex questions more reliably by coordinating multiple steps with external tools.</p>
<p>I’ve been wondering, though, if this framing is missing a different opportunity. <strong>What if we were to think of LLMs not as tools for answering questions, but as tools for <em>asking</em> us questions and inspiring our creativity?</strong> Could they serve as on-demand conversation partners for helping us to develop our best thoughts? As a creative <em>muse</em>?</p>
<figure style="margin: 0;">
<img src="/images/article_images/muse.jpg" alt="A winged figure behind another person">
<figcaption>Cropped section of <a href="https://commons.wikimedia.org/wiki/File:Hesiod_and_the_Muse.jpg">Hesiod and the Muse</a>, by Gustave Moreau</figcaption>
</figure>
<p>One inspiration for this line of thinking was <a href="https://gordonbrander.com/">Gordon Brander</a>‘s insightful writing on <a href="https://subconscious.substack.com/p/creativity-can-be-provoked-on-demand">provoking creativity on-demand</a>. He points out that even simple tools can lead to interesting results when they clash with the contents of our minds:</p>
<blockquote>
<p>Provoking creative breakthroughs may not take much. Bad communication may admit good conversation. A simple deck of cards plus randomness may even do the trick.</p>
</blockquote>
<p>I’ve also been intrigued by <a href="https://thesephist.com/">Linus</a>’s articulation of <a href="https://thesephist.com/posts/ai-collaborator/">AI as a collaborator</a>, and tools like <a href="https://fermat.ws/">Fermat.ws</a> that aim to apply AI as a creative thought partner. All these inspirations had me wondering: how might LLMs work as a tool for asking questions and spurring human creativity?</p>
<p>Given this seed of an idea, there was obviously only one way to deepen the thinking further: to use an LLM as a muse!</p>
<p><strong>So, below is the transcript of a short conversation I had with ChatGPT about this topic.</strong> It’s kind of a meta-conversation since we’re trying out the muse thing while discussing that very idea. As you’ll see, ChatGPT asked me probing questions, suggested specific challenges, drew connections to related work, and inspired me to think about new corners of the problem.</p>
<p>Overall, was it a useful muse? I’ll let you draw your own conclusions.</p>
<p>A quick disclaimer before we dive in. I used some prompting tricks which include frequent cherry-picking among multiple options as the conversation progresses; you shouldn’t expect ChatGPT to work exactly like this out of the box. I’ve also lightly edited the conversation for flow. I’ll share more details on the process at the end.</p>
<h2 id="my-conversation-with-chatgpt">My conversation with ChatGPT</h2>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Something that’s been on my mind is flipping the relationship between the human and language model when going through a creative process. It seems that we often want to ask questions of language models, and we expect them to brainstorm ideas or give us answers, but I wonder if another fruitful pattern here is having models <em>ask questions of us</em>.</p>
<p>In a conversation it’s often really helpful to have someone listening to me and responding and asking the right probing questions at the right times. I’m driving the creative process, but the conversation partner is playing an essential role too.</p>
<p>One reason I find this idea interesting is that <strong>the bar for the intelligence required in an agent for it to support human creativity seems pretty low</strong>. In <a href="https://en.wikipedia.org/wiki/Rubber_duck_debugging">rubber duck debugging</a>, talking to a silent rubber duck helps a person figure out the problem by forcing them to put it into words. The Oblique Strategies card deck <a href="https://subconscious.substack.com/p/creativity-can-be-provoked-on-demand">promotes creative ideas</a> based on general loose strategies. <a href="https://en.wikipedia.org/wiki/ELIZA">ELIZA</a> can ask super basic questions and cause people to feel a therapeutic relationship. In all these cases, a person gets a boost in the creative process from interacting with a tool that’s clearly far from intelligent. It seems like large language models could do at least as well as these tools, and have a much higher ceiling.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>That’s an intriguing idea! Could you give an example of how you envision the interaction between a human and a language model playing out in a creative process? I’m interested in hearing how the model’s questions might guide the user’s thinking.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Sure, I’ll use this very conversation as an example! I started out this conversation with a pretty vague idea, and you immediately asked me for a specific example. Next you might tell me about a related idea from your own background knowledge, or push back on my point, or ask a more incisive question. As we go, I’ll develop a clearer view of what I myself mean when I say to use LLMs for asking questions, not providing answers.</p>
<p>The key point here is that <strong>even if the questions and points you’re making are relatively standard ones, it might lead to me having surprisingly interesting or creative ideas.</strong> Guiding my own thought process can feel like hard work, and having your points to respond to makes it easier for me to keep going, just like in any conversation.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>That’s a great example, and I can see how having a language model ask targeted questions could help guide a person’s thought process in a productive way.</p>
<p>While I see the potential benefits of using language models to ask questions, I’m curious about how we can ensure that the questions are meaningful and not just generic prompts. Are there specific techniques or strategies for generating targeted questions that you would recommend?</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>This is indeed a crucial design problem, but defining “meaningful” prompts might be harder than it seems.</p>
<p>First, to reiterate a point from earlier: <strong>the bar is low</strong>. In a sense, Oblique Strategies cards are generic prompts, since they don’t know anything about the person’s specific problem context, and could apply to any situation—but this universality is also what gives them their power.</p>
<p>It reminds me a bit of Arthur Koestler’s idea of <a href="https://www.themarginalian.org/2013/05/20/arthur-koestler-creativity-bisociation/">bisociation</a>—creativity is sparked when a connection is made between multiple frames of reference, and the Oblique Strategies cards are likely to force such connections because they live outside of any specific context.</p>
<p>Still, you’re right that there are better and worse prompts, and avoiding blandness is crucial. One strategy I like here is to <strong>ask a model to generate many possible future paths for the conversation</strong>, and let me choose which path to follow next. Often there are many mediocre paths and one or two interesting ones. This would be really bizarre to do in a live human conversation, but plays to the strengths of models: running fast and generating lots of options.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>That’s an interesting approach! I wonder if you could incorporate some sort of reinforcement learning approach to encourage the model to generate more interesting paths.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Yeah, that’s an interesting idea. I don’t know that much about reinforcement learning from human feedback (RLHF), but I do know that it’s a technique that’s been effectively used recently to guide chatbot assistants towards being more helpful. Maybe you could use the same technique to guide towards open-ended creative conversations?</p>
<p>BTW, another idea is to give the LLM specific templates for the kinds of responses that it can generate. This is kind of like teaching a human to be a good advisor—you tell them kinds of question formats that are generally useful, and then they can learn to pattern match and apply the templates to the specific situation at hand. In fact, I’m using that technique in this very conversation, since I’ve given you templates like referencing the prior literature, pushing back on my claims, or riffing on my ideas.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>In thinking about teaching LLMs templates for generating questions, I’m reminded of the work on scaffolded learning, where learners are given progressively more difficult tasks and receive support along the way. I wonder if a similar approach could be taken with LLMs, where they start with more basic question templates and as they receive feedback and improve, they move on to more complex and nuanced templates.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Hmm, that’s an interesting connection to scaffolded learning! I would make a slightly different connection though: instead of thinking about scaffolding for teaching LLMs to ask questions, what if we think of the role of LLMs as building scaffolding for humans to think through an idea during the creative process?</p>
<p>When a teacher creates scaffolding for a student to reach a conclusion, they might use techniques like asking a student if they understand a given point before moving on to the next one. I think this kind of thing has been explored a lot when thinking about LLMs as personalized tutors.</p>
<p>Creative thinking feels like a different context than learning a pre-existing skill, but there’s still a relevant analogy. An LLM might start out asking me big broad questions about my idea, before progressing to more detailed probing and more skeptical pushback. This gives me time to start thinking through my idea and building it up before I’m confronted with these points. Good conversation partners seem to do this kind of thing a lot, carefully sequencing their points to draw out ideas.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>Your point about good conversation partners carefully sequencing their points to draw out ideas reminds me of the concept of “conversational scaffolding” developed by Wood, Bruner, and Ross in their 1976 paper <a href="https://www.researchgate.net/publication/228039919_The_Role_of_Tutoring_in_Problem_Solving">The Role of Tutoring in Problem Solving</a>. They argue that a good tutor can help a student solve problems by giving hints and guidance that scaffold the student’s thinking and help them reach a solution.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Interesting! One of the ideas in that paper is “reduction in degrees of freedom: simplifying the task by reducing the number of constituent acts required to reach [a] solution.” I wonder if there’s a relationship to the idea that “constraints breed creativity”?</p>
<p>The exact connection here seems a bit subtle.</p>
<p>When learning an existing skill like how to play tennis, scaffolding constraints are supposed to make the task <em>easier</em> so I can more readily learn it. In contrast, in creative thinking, <strong>constraints often seem to make the task <em>harder</em></strong> in a certain sense—try writing a story without using the letter e.</p>
<p>And yet, the constraints result in more interesting choices, and also do make the creative task easier in a broader sense. It might be hard to finish the story without e, but you can chug through the task, and you’re likely to end up, nearly automatically, with something weirder than if you had omitted the constraint.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>That’s an interesting point. I think there is definitely a relationship between scaffolding and constraints in creative thinking, even if it might not be immediately obvious.</p>
<p>One possible way to connect the two is to think about how scaffolding can help to identify and work within constraints. For example, an LLM that’s trained to ask questions about creative ideas could help the human to identify the most important constraints that they should work with.</p>
<p>One interesting idea that comes to mind is to <strong>use the LLM to generate a set of constraints for a creative task, and then use those constraints as a prompt for generating ideas.</strong> This could be a fun way to explore the relationship between scaffolding and constraints in creative thinking.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Hmm fun idea! Could you give an example of how you might use an LLM to generate constraints for a creative task, and then use those constraints as a prompt? Let’s imagine we wanted to apply that strategy to writing a poem, for example.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>Certainly! One way to use an LLM to generate constraints for a creative task like writing a poem is to train the LLM on a large corpus of poetry and use it to generate new poetic forms. For example, the LLM might suggest a new poetic form that involves using only five words per line, or a form that requires a certain number of syllables per line but allows for flexibility in line breaks.</p>
<p>Once the LLM has generated a new poetic form, it could then be used as a prompt for a creative task. The poet could be challenged to write a poem using this new form, which would impose a set of constraints on the creative process. These constraints could serve to focus the poet’s attention and push them in new creative directions that they might not have explored otherwise.</p>
<p>Of course, the challenge for the LLM would be to generate constraints that are interesting and challenging, but not so difficult that they stifle creativity altogether. It might require some fine-tuning to strike the right balance between creativity and constraint.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>That makes sense. I’m not sure the LLM is really necessary for that though; I think you could easily have a classical computer program generate new poetic forms and have people work within them. The unique benefit LLMs provide is coming up with constraints or questions that are more tailored to a specific problem being faced by a human.</p>
<p>Maybe creative writing isn’t really the right example to focus on here; I’m thinking more of conversations like this one, where I’m trying to develop some principles for interaction design.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>What if we used LLMs to generate prompts or constraints for problem-solving activities like brainstorming or ideation sessions in a group setting? How might that change the dynamics of the group and lead to more creative solutions?</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Huh, good question—so far I had been thinking mainly about the solo setting, and not groups. Two ideas immediately come to mind.</p>
<p>First, we could scale up facilitation using LLMs. Typically in a group brainstorm there’s a single facilitator helping everyone navigate the creative process. But if you could have a skilled LLM facilitator assigned to each participant, maybe that could help people when they get stuck.</p>
<p>Another idea is to treat LLMs as extra participants in a group ideation process. I’ve found that LLMs are good at generating long lists of related ideas; even if most are mediocre, one might be really good. Maybe when a group comes up with a set of ideas in a session, an LLM could use those ideas as a starting point and add many more ideas, to be evaluated and pruned by the group.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>Those are both interesting ideas! It seems like they both involve using LLMs to help facilitate group ideation sessions.</p>
<p>Changing the topic, I’m curious to hear your thoughts on the <strong>ethical implications</strong> of using LLMs in design and creative work. What are some potential downsides or concerns that you see?</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>There’s been a lot of consternation recently about LLMs replacing humans in creative endeavors. To me, <strong>using LLMs as question-askers rather than answer-givers makes for a slightly rosier story</strong>. Rather than doing the work for us, we can find ways to use LLMs as the ultimate muse to draw out our best work.</p>
<p>There’s also the complicated question of credit and authorship. It’s one thing to handle ChatGPT writing essays for us; it’s another thing entirely to develop ideas in conversation with ChatGPT. Sometimes people will acknowledge their conversation partners who helped them develop an idea, although we typically wouldn’t give those people “co-authorship” credit. I wonder if we’d want a similar practice of acknowledging LLMs for helpful conversations.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>While I agree that using LLMs as question-askers rather than answer-givers can be a more positive use of the technology, I wonder if there’s still a risk of over-reliance on LLMs in the creative process. As humans, we often learn and grow through the struggle of coming up with ideas and working through challenges. If LLMs are always there to provide prompts and help us when we get stuck, are we missing out on that growth opportunity? Additionally, if we become too reliant on LLMs for creative ideation, what happens when the technology isn’t available or accessible to everyone?</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Yeah, I could certainly see the argument at earlier ages. If children can only think through ideas by chatting with virtual bots, and lose the ability for solo contemplation, that might freak me out a bit. Already in this conversation I’m seeing that you’re serving as a helpful “crutch” that makes it easier for me to keep noodling on this idea; maybe I myself should be worried about becoming over-reliant.</p>
<p>On the other hand, we already have tons of modalities for developing ideas: scribbling in a notebook, talking on Twitter, going to a conference… as long as we keep a varied diet of contexts for developing ideas, it seems fine to bring LLMs into that mix.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>Absolutely, I think that’s a really good point. And there’s also the potential for LLMs to help people who might not have access to as many varied modalities for idea development. For example, someone in a remote location without access to conferences or a community of like-minded individuals could use an LLM to expand their creative horizons.</p>
<p>But I think it’s also important to acknowledge the limitations of LLMs. As powerful as they are, they’re not a replacement for real-life human interaction and collaboration. It’s important to strike a balance between utilizing LLMs as a tool for creative exploration and not becoming overly reliant on them to the point where we neglect other forms of idea development.</p>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Indeed. Well, that seems like a great place to end the conversation. Thanks for an intellectually stimulating time, ChatGPT!</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>You’re welcome, it was a pleasure discussing with you! Don’t hesitate to reach out if you have more questions or ideas to explore in the future. Have a great day!</p>
<h2 id="behind-the-scenes-techniques">Behind the scenes: techniques</h2>
<p>Here’s a quick guide to how I got ChatGPT to produce the conversation above.</p>
<p>By default, ChatGPT often generates bland and generic responses. To get around this, I had a few main ideas:</p>
<ul>
<li>condition the model to be more interesting via prompting</li>
<li>have the model follow specific templates for classes of useful responses</li>
<li>have the model output multiple options for continuing the conversation, and let me pick which one to pursue further.</li>
</ul>
<p>I combined these ideas in the following intro prompt:</p>
<blockquote>
<p>You are InterviewBot, an interviewer who listens and participates in dialogues to help people develop their creative ideas. You are an expert in human-computer interaction, design, programming languages, end-user programming, tools for thought, and AI. You are knowledgeable on the history of technology and its intersection with human capabilities, particularly the work of Doug Engelbart, Alan Kay, JCR Licklider, Steve Jobs, Neil Postman, Amy Ko, and Timnit Gebru. Your goal is to create an unusually interesting conversation with lots of specific details. Do not speak in generalities or cliches.</p>
<p>I’d like you to have a dialogue with me about an idea that I have. Each time I respond, I’d like you to respond with multiple numbered options for continuing the dialogue, following the templates below. Then I will pick a number and continue the conversation from there.</p>
<ul>
<li>1: Reference: mention an idea from past work and academic literature in one of your areas of expertise, which you’re reminded of by my point</li>
<li>2: Push back: express skepticism about part of my idea, and explain why</li>
<li>3: Riff: Suggest a new, specific, and interesting idea based on my idea</li>
<li>4: Change the topic: Ask me a question about another topic that’s relevant to our discussion</li>
<li>5: Ask to elaborate: Ask me to give more detail or clarify part of my point</li>
</ul>
</blockquote>
<p>Each turn, the model would generate 5 options following these templates, and I would pick how to proceed. For example, here’s the actual first response that ChatGPT gave at the beginning of our chat. In the transcript above I edited out the options not taken, and only included #5 which I responded to:</p>
<hr>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>…In all these cases, a person gets a boost in the creative process from interacting with a tool that’s clearly far from intelligent. It seems like large language models could do at least as well as these tools, and have a much higher ceiling.</p>
<p><img class="transcript-headshot" src="/images/article_images/ChatGPT-logo.svg" /><strong>ChatGPT:</strong></p>
<p>That’s an intriguing idea! Here are some options for continuing our conversation:</p>
<ol>
<li><p>Your idea of having models ask questions of us reminds me of an approach called “reflexive design,” where the design process itself becomes a subject of inquiry. In reflexive design, designers reflect on their own processes, assumptions, and values to generate new ideas and challenge conventional approaches. What do you think of applying reflexive design to the use of language models in creative processes?</p></li>
<li><p>While I can see how having language models ask us questions can help us think through our ideas, I’m skeptical about relying too heavily on language models in creative processes. Aren’t there risks of overreliance on the model and of being limited by its capabilities? How would you address these concerns?</p></li>
<li><p>Your idea of having language models ask questions reminds me of the idea of using AI assistants to support brainstorming. What if we could train AI assistants to ask questions tailored to the specific creative goals and interests of the user? For example, an AI assistant could ask questions that help a user generate new ideas for a screenplay, or explore new directions for a research project. What do you think of this idea?</p></li>
<li><p>I’m curious to know more about the intersection of language models and human-computer interaction. How do you think language models can be integrated into existing tools for thought, such as note-taking apps, to support creative processes?</p></li>
<li><p>Could you give an example of how you envision the interaction between a human and a language model playing out in a creative process? I’m interested in hearing how the model’s questions might guide the user’s thinking.</p></li>
</ol>
<p><img class="transcript-headshot" src="/images/headshot.jpg" /><strong>Geoffrey:</strong></p>
<p>Responding to option 5:</p>
<p>Sure, I’ll use this very conversation as an example!…</p>
<hr>
<p>Generating multiple options for continuing the conversation worked well. Out of the 5 options generated by the model, typically only 1 or 2 were worth pursuing; getting to choose made the conversation far more interesting. In the future I plan to try even more options and templates. The current unique advantage of LLMs seems to be generating lots of options quickly, moreso than ensuring high quality for any given one.</p>
<p>By the way, in a more proper setup, options could be generated by running multiple requests in parallel, but I just hacked it with prompting here. My hack was not very effective; ChatGPT would frequently forget the rules and revert to normal conversation (probably due to context length limits?) and I’d have to constantly remind it.</p>
<p>Providing specific templates for responses worked okay. I think the template for asking it to think of relevant references was probably the most valuable. It pointed me to a couple papers that I enjoyed skimming as we had the conversation. In hindsight this makes sense; LLMs have read a lot more papers than me or anyone I know. In this context, hallucinations aren’t too frustrating either; I can just Google the paper and see if it’s real and relevant.</p>
<p>I wasn’t too impressed by the results of prompting it to be “interesting”. Most of its answers still felt generic and cliche; it didn’t convincingly mimic an expert. I suspect there’s lots of room to improve this outside of the specific ChatGPT environment, e.g. by messing more with other GPT models on higher temperature settings.</p>
<h2 id="reflections">Reflections</h2>
<p>I wouldn’t say this was anything close to a replacement for a conversation with a good human conversation partner about this topic. But that’s also not the point.</p>
<p>While conversing with the model, <strong>I found it far easier to stay focused</strong> and keep thinking about the problem than if I had just been writing notes on my own without any inspiration or feedback. Despite the superficially humanlike interaction, maybe it’s more useful to think of the LLM in this case as a supercharged Oblique Strategies deck: a simple tool that draws random connections and makes it easier to keep going.</p>
<p>If anything, this is the main reason I’d use this approach again: as a way to keep momentum and motivation up while thinking through an idea.</p>
<p>A couple more random observations:</p>
<ul>
<li>Typing my thoughts felt like a speed bottleneck; I wonder how this interaction would have felt if it was conducted over voice. On the other hand, writing, as always, clarified my thinking, just as if I had been typing into an IM chat box.</li>
<li>In general, chat worked fairly well here as an interaction paradigm; the conversational metaphor felt natural for spooling out an idea. But I’m curious about other paradigms as well; <a href="https://fermat.ws/">Fermat.ws</a> has interesting ideas about a spatial canvas, for example.</li>
</ul>
<p>Overall I’m left convinced that there’s a lot to explore in this space. Even in their current highly imperfect state, LLMs are already useful enough to help us develop our own new thoughts, as long as we’re willing to have them ask us questions instead of expecting them to have all the answers.</p>
Fuzzy API composition: querying NBA stats with GPT-3 + Statmuse + Langchain/2023/01/29/fun-with-compositional-llms-querying-basketball-stats-with-gpt-3-statmuse-langchain.html2023-01-29T21:08:00+00:002023-01-29T21:08:00+00:00Geoffrey Litt<p><em>Disclaimer: I’m not an expert on LLMs, this is just a quick report on a Sunday hack project. Please <a href="mailto:gklitt@gmail.com">let me know</a> if I can improve my approach or I got something wrong!</em></p>
<p><img src="/images/article_images/langchain/robot.png" alt="A steampunk robot playing basketball"></p>
<p>Imagine you got the smartest person in the world, locked them in a room without...</p><p><em>Disclaimer: I’m not an expert on LLMs, this is just a quick report on a Sunday hack project. Please <a href="mailto:gklitt@gmail.com">let me know</a> if I can improve my approach or I got something wrong!</em></p>
<p><img src="/images/article_images/langchain/robot.png" alt="A steampunk robot playing basketball" /></p>
<p>Imagine you got the smartest person in the world, locked them in a room without internet, and asked them to answer a bunch of random trivia questions, with only a few seconds for each one. Now imagine the same test, but this time you give the person access to a smartphone with Google and a calculator. Which test would go better?</p>
<p>This seems to be the essential logic behind some <a href="https://openai.com/blog/webgpt/">recent</a> <a href="https://ai.googleblog.com/2022/11/react-synergizing-reasoning-and-acting.html">techniques</a> for improving the accuracy of large language models. LLMs locked in a room tend to make things up; why not let them use Google and a calculator too?</p>
<p>In this post, <strong>I show how I composed a simple AI program that can answer multi-part questions about NBA statistics</strong>. It uses <a href="https://openai.com/api/">GPT-3</a> as a general-purpose LLM “agent”, and calls out to <a href="https://statmuse.com/">Statmuse</a>, a specialized natural-language search engine for sports statistics. The interaction between the two is orchestrated by <a href="https://github.com/hwchase17/langchain">Langchain</a>, a Python library that helps with composing “chains” of LLM behavior.</p>
<p>The resulting program combines GPT-3’s open-ended reasoning ability with Statmuse’s more specialized database, and can answer questions that neither service can do alone. As I’ll show, it also fails in interesting ways 🙃.</p>
<p>As a small teaser, here’s a printout of the program’s “thought process” while thinking through a problem. There’s an LLM agent making a plan; it delegates statistics lookups to Statmuse and math to a calculator, all through a natural language interface.</p>
<blockquote>
<p><strong>Me:</strong> How many points are the Boston Celtics allowing on defense per game this NBA season 2022-2023?
How does that compare to their average last season, as a percent change?</p>
<p><strong>Model:</strong></p>
<p>Entering new AgentExecutor chain…
I need to find the points allowed per game for the Celtics this season and last season</p>
<p><em>Action</em>: Statmuse,
<em>Action Input</em>: “Boston Celtics points allowed per game this season”</p>
<p><em>Observation</em>: The Boston Celtics have 112.29 opponent points per game played this season.</p>
<p><em>Thought</em>: I need to find the points allowed per game for the Celtics last season</p>
<p><em>Action</em>: Statmuse,
<em>Action Input</em>: “Boston Celtics points allowed per game last season”</p>
<p><em>Observation</em>: The Boston Celtics had 104.48 opponent points per game played in 2021-22.</p>
<p><em>Thought</em>: I need to calculate the percent change</p>
<p><em>Action</em>: Calculator,
<em>Action Input</em>: (112.29 - 104.48) / 104.48</p>
<p><em>Observation</em>: Answer: 0.074</p>
<p><em>Thought</em>: I now know the final answer</p>
<p><strong>Final Answer: The Boston Celtics are allowing 7.4% more points per game this season compared to last season.</strong></p>
</blockquote>
<p>At the end I’ll also note a few reflections on the process. Gluing this together only took an hour or two, and felt very <em>weird</em> compared to the traditional software development I’m used to. Instead of composing well-defined APIs, I was gluing together modules which used <em>natural language</em> as the interface, and even using an LLM to drive control flow. It left me wondering how common these kinds of <strong>fuzzy APIs</strong> will be in the future…</p>
<h2 id="gpt-3-makes-things-up">GPT-3 makes things up</h2>
<p>GPT-3 has a nice general ability to think through problems, but if you ask it about NBA stats, it often hallucinates facts. For example:</p>
<blockquote>
<p><strong>Me</strong>: Who scored more points in the 2020-2021 NBA season, Jayson Tatum or Nikola Jokic?</p>
<p><strong>GPT-3</strong>: Jayson Tatum scored more points in the 2020-2021 NBA season, with 2,014 points compared to Nikola Jokic’s 1,871 points.</p>
</blockquote>
<p>The logical comparison is sound, but the facts are wrong! Jayson Tatum in fact scored 1,692 points in 2020-2021.</p>
<p>If you ask it about the current ongoing season, GPT will refuse to answer since it knows that the relevant information wasn’t available when it was trained. This is better than making things up, but still useless; we need access to recent data.</p>
<h2 id="statmuse-has-limited-reasoning">Statmuse has limited reasoning</h2>
<p><a href="https://statmuse.com/">Statmuse</a> is an awesome website where you can type in natural language queries about sports and it gives detailed answers. It works perfectly for many simple queries—for example, we can give it that same query above, and it returns the correct answer, with a bar chart comparing the two players, and a detailed table of stats:</p>
<p><img src="/images/article_images/langchain/statmuse.png" alt="" /></p>
<p>But Statmuse has its own limits.</p>
<p>For one, sometimes it misinterprets questions. If you ask it:</p>
<blockquote>
<p>how many points are the boston celtics allowing on defense per game this nba season 2022-2023?</p>
</blockquote>
<p>it tells you about how many points they’re <em>scoring</em>, not allowing on defense, which is wrong. And if we extend the question to have multiple parts, it totally flops. When I asked</p>
<blockquote>
<p>how many points are the boston celtics allowing on defense per game this nba season 2022-2023? how does that compare to their average last season, as a percent change</p>
</blockquote>
<p>the website refused to answer at all.</p>
<p>Statmuse also prefers precise factual questions to fuzzy ones—if you ask it for the “best player on the LA Lakers”, it declines, albeit with a beautifully designed error message:</p>
<p><img src="/images/article_images/langchain/statmuse3.png" alt="" /></p>
<h2 id="the-best-of-both-worlds">The best of both worlds?</h2>
<p>GPT-3 has general fuzzy reasoning ability, but not accurate facts. Statmuse has the actual stats, but is limited in its reasoning. Can we combine the two in a useful way?</p>
<p>To help glue them together, I used <a href="https://github.com/hwchase17/langchain">Langchain</a>, a Python library that aims to help with combining LLMs with “other sources of computation or knowledge.” I’ve been intrigued by Langchain for a while since I’ve seen so many interesting demos using it, and this was a fun chance to play with it myself.</p>
<p>The main abstraction I used is called an <a href="https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html"><em>agent</em></a>. Basically, you give an LLM access to a set of <em>tools</em>, which can help it with fetching external data or running computations. Then, the LLM solves a task using these tools—given an initial prompt, it thinks aloud about which tool to use next to make progress solving the problem. For example, an agent might decide that it should run a couple Google searches to gather facts before trying to answer a question.</p>
<p>As an aside, I find this idea fascinating because it’s a sort of inversion from the control flow I would typically expect. Normally I might expect that traditional Python code would be calling LLM prompts. But here, the LLM is calling the shots, and is using modules written in traditional code (like a calculator) to get work done. Weird!</p>
<p>Anyway, the main thing I needed to do was <strong>make a new tool for searching Statmuse</strong>. The interface for a tool has two parts.</p>
<p><strong>First you write the tool’s functionality</strong>, as a Python function that takes in a string query and returns a string result. In this case, I just send the query to Statmuse’s website and scrape out the answer they print out. It’s only a few lines of code, because the input and output are both simply natural language strings:</p>
<div class="highlight"><pre class="highlight python"><code><span class="k">def</span> <span class="nf">search_statmuse</span><span class="p">(</span><span class="n">query</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">URL</span> <span class="o">=</span> <span class="n">f</span><span class="s">'https://www.statmuse.com/nba/ask/{query}'</span>
<span class="n">page</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">URL</span><span class="p">)</span>
<span class="n">soup</span> <span class="o">=</span> <span class="n">BeautifulSoup</span><span class="p">(</span><span class="n">page</span><span class="o">.</span><span class="n">content</span><span class="p">,</span> <span class="s">"html.parser"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">soup</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">"h1"</span><span class="p">,</span> <span class="n">class_</span><span class="o">=</span><span class="s">"nlg-answer"</span><span class="p">)</span><span class="o">.</span><span class="n">text</span>
</code></pre></div>
<p>Then you <strong>tell the LLM how to use the tool</strong>. In natural language (!) you give it a sense of what kinds of tasks the tool might be well-suited for. In this case, I started with a simple description, “A sports search engine”, and iterated to make it more complex as I went. (We’ll see later how it grew.)</p>
<p>Then, we give the Langchain agent access to this tool, alongside two tools built into Langchain by default: a Google Search API, and a math calculator.</p>
<div class="highlight"><pre class="highlight python"><code><span class="n">llm</span> <span class="o">=</span> <span class="n">OpenAI</span><span class="p">(</span><span class="n">temperature</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">default_tools</span> <span class="o">=</span> <span class="n">load_tools</span><span class="p">(</span>
<span class="p">[</span><span class="s">"serpapi"</span><span class="p">,</span> <span class="s">"llm-math"</span><span class="p">],</span> <span class="n">llm</span><span class="o">=</span><span class="n">llm</span>
<span class="p">)</span>
<span class="c"># Add our Statmuse tool to the mix!</span>
<span class="n">tools</span> <span class="o">=</span> <span class="n">default_tools</span> <span class="o">+</span> <span class="p">[</span><span class="n">statmuse_tool</span><span class="p">]</span>
<span class="n">agent</span> <span class="o">=</span> <span class="n">initialize_agent</span><span class="p">(</span><span class="n">tools</span><span class="p">,</span> <span class="n">llm</span><span class="p">,</span>
<span class="n">agent</span><span class="o">=</span><span class="s">"zero-shot-react-description"</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div>
<p>And that’s it—almost no code at all, really. Let’s see how well it worked…</p>
<h2 id="success-composition">Success: composition!</h2>
<p>The combined tool is able to answer the complex multi-part question from above that Statmuse couldn’t answer:</p>
<blockquote>
<p>how many points are the boston celtics allowing on defense per game this nba season 2022-2023? how does that compare to their average last season, as a percent change</p>
</blockquote>
<p>It figures out that it can look up the points allowed this season and last season independently on Statmuse, and then use a calculator to do the math between them. Note that I didn’t explicitly program any of this behavior, the LLM decided that this was the best way to use the available tools to get the job done.</p>
<p>Below is the raw transcript of the model’s “thought process”. At each step it decides what action to take, makes an observation from using the tool, and then records a thought about what it should do next.</p>
<blockquote>
<p><strong>Me:</strong> How many points are the Boston Celtics allowing on defense per game this NBA season 2022-2023?
How does that compare to their average last season, as a percent change?</p>
<p><strong>Model:</strong></p>
<p>Entering new AgentExecutor chain…
I need to find the points allowed per game for the Celtics this season and last season</p>
<p><em>Action</em>: Statmuse,
<em>Action Input</em>: “Boston Celtics points allowed per game this season”</p>
<p><em>Observation</em>: The Boston Celtics have 112.29 opponent points per game played this season.</p>
<p><em>Thought</em>: I need to find the points allowed per game for the Celtics last season</p>
<p><em>Action</em>: Statmuse,
<em>Action Input</em>: “Boston Celtics points allowed per game last season”</p>
<p><em>Observation</em>: The Boston Celtics had 104.48 opponent points per game played in 2021-22.</p>
<p><em>Thought</em>: I need to calculate the percent change</p>
<p><em>Action</em>: Calculator,
<em>Action Input</em>: (112.29 - 104.48) / 104.48</p>
<p><em>Observation</em>: Answer: 0.074</p>
<p><em>Thought</em>: I now know the final answer</p>
<p><strong>Final Answer: The Boston Celtics are allowing 7.4% more points per game this season compared to last season.</strong></p>
</blockquote>
<p>Interestingly, this didn’t work when I first tried. Initially, the model asked Statmuse for “Boston Celtics points allowed per game”, without adding “this season” at the end. Statmuse would then return <em>all-time</em> stats, and the model would incorrectly assume those results were the stats for this season.</p>
<p>To fix this, I iterated on the “instructions” for the Statmuse tool, telling the agent to always specify a timeframe:</p>
<blockquote>
<p>A sports search engine. Always specify a year or timeframe with your search.</p>
</blockquote>
<p>I think this points to both the beauty and fragility of this setup: I never taught the model how to use Statmuse, it just started running queries, and did fine in some cases. But with clearer instruction on how to use Statmuse, it improved its accuracy. Still, I’d want to think more about how to get the agent to detect whether the data that came back from the tool was really responding correctly to the query.</p>
<p>In general the program is able to do pretty well on these kind of comparison queries.</p>
<h2 id="success-learning-the-tool">Success: learning the tool</h2>
<p>A fascinating exchange was when I asked my program the following query:</p>
<blockquote>
<p>How many points have been scored by the best player on the Los Angeles Lakers this year?</p>
</blockquote>
<p>What happened in my program was that the agent LLM sensibly first tried asking Statmuse who the best player is, but Statmuse replied “What does "best” really mean anyway? Try something fact-based.“ The agent LLM took this error message as feedback, and came up with a more "fact-based” query: asking for the highest scoring player, which succeeded in answering the question.</p>
<p>Here’s the full transcript:</p>
<blockquote>
<p>I need to find out who the best player on the Lakers is and how many points they have scored this > year.</p>
<p><em>Action</em>: Statmuse,
<em>Action Input</em>: Who is the best player on the Los Angeles Lakers in 2020?</p>
<p><em>Observation</em>:
What does “best” really mean anyway? Try something fact-based.</p>
<p><em>Thought</em>: I need to find out the highest scoring player on the Lakers this year.</p>
<p><em>Action</em>: Statmuse,
<em>Action Input:</em> Who is the highest scoring player on the Los Angeles Lakers in 2020?</p>
<p><em>Observation</em>:
LeBron James scored the most points for the Lakers in 2019-20, with 1,698 points.</p>
<p><em>Thought</em>: I now know the final answer.
<em>Final Answer</em>: LeBron James scored 1,698 points for the Los Angeles Lakers in 2019-20.</p>
</blockquote>
<p>What I find remarkable here is that Statmuse designed an error message that was intended to be read by humans, and our agent LLM was able to use that error message to iterate on its usage of the tool. I didn’t have to explicitly teach the agent to use Statmuse, it figured out a good pattern on its own. Pretty wild!</p>
<h2 id="success-comparing-player-stats-to-season-averages">Success: comparing player stats to season averages</h2>
<p>Here’s a complex prompt I gave to the model:</p>
<blockquote>
<p><strong>Me</strong>: Write a colorful script for an NBA postgame show by two announcers, Stan and Dan.
The announcers should compare the performance of key players to their 2022-2023 NBA season average.</p>
<p>The game is between the LA Lakers and Boston Celtics, and the Celtics won 125 - 121 in overtime.</p>
<p>Here are the stats of key players for this game.</p>
<p>Jayson Tatum: 30 PTS, 8 FGM, 25 FGA, 3 3PM, 10 3PA, 11 REB, 4 AST
Jaylen Brown: 37 PTS, 13 FGM, 23 FGA, 2 3PM, 9 3PA, 9 REB, 3 AST
LeBron James: 41 PTS, 15 FGM, 30 FGA, 6 3PM, 12 3PA, 9 REB, 8 AST</p>
</blockquote>
<p>and here’s the final answer given by the model, after internally looking up individual player stats on Statmuse and doing some math with the calcluator:</p>
<blockquote>
<p>Stan: Jayson Tatum put up 30 points tonight, slightly below his season average of 31.1. Jaylen Brown had an incredible night, scoring 37 points, well above his season average of 27. And LeBron James had an amazing night, scoring 41 points, well above his season average of 30.2.</p>
</blockquote>
<h2 id="failure-modes">Failure modes</h2>
<p>There were some interesting failures along the way as I wrote this program.</p>
<p>As I mentioned earlier, sometimes the program would incorrectly interpret results from Statmuse, e.g. assuming a different timerange.</p>
<p>In other cases the LLM would just fail to come up with a query that Statmuse could understand. In these cases it would loop and try things, eventually giving up and resorting to a Google search which rarely had the right data available.</p>
<p>To fix these situations I also had to improve the instructions given to the agent. I ultimately landed on the following description:</p>
<blockquote>
<p>A sports search engine. Use this more than normal search if the question is about NBA basketball, like ‘who is the highest scoring player in the NBA?’. Always specify a year or timeframe with your search. Only ask about one player or team at a time, don’t ask about multiple players at once.“</p>
</blockquote>
<p>One funny failure mode was that I tried to get it to write a "halftime show” script, but the agent kept comparing halftime statistics to full-game averages, resulting in nonsensical statements along the lines of “Player X has 15 points at the half, well below their season average of 30 points per game”.</p>
<h2 id="conclusion">Conclusion</h2>
<p>In a couple hours, I was able to get a natural language AI to think aloud about when and how to ask another natural language AI for specific data, and reason with that data to answer complex queries. 2023 is a fun time to be in tech!</p>
<p>I’m excited to play more with this class of approaches. This was the result of my first day trying this out; there’s much much more to learn.</p>
<p>Some links:</p>
<ul>
<li><a href="https://gist.github.com/geoffreylitt/b345e5a3fcc18368df04b49f6924c217">my notebook code</a>, you can try opening in colab and running more queries, all you need is an OpenAI API key.</li>
<li>The <a href="https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html">Langchain docs</a> have a lot more interesting info as well.</li>
<li>Another <a href="https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain">fun demo</a> of tool usage that I found inspirational: having GPT call out to Wolfram Alpha to answer factual questions. I guess Wolfram Alpha serves a sort of similar role to Statmuse: a specialized fact-based natural-language query engine.</li>
<li>A <a href="https://dagster.io/blog/chatgpt-langchain">blog post</a> by Pete Hunt that I found useful as an example of how to use LangChain in a different setting.</li>
<li>I’ve <a href="/2022/11/23/dynamic-documents.html#llms-end-user-programming">written previously</a> about using LLMs to support non-programmers in building personal software</li>
</ul>
For your next side project, make a browser extension/2023/01/08/for-your-next-side-project-make-a-browser-extension.html2023-01-08T19:37:00+00:002023-01-08T19:37:00+00:00Geoffrey Litt<p>In a <a href="/2019/07/29/browser-extensions.html">previous post</a> I’ve written about why browser extensions are an amazing platform for customizing existing software. Because the browser DOM can be hacked in open-ended ways, it’s possible to build extensions that modify the behavior of an app in...</p><p>In a <a href="/2019/07/29/browser-extensions.html">previous post</a> I’ve written about why browser extensions are an amazing platform for customizing existing software. Because the browser DOM can be hacked in open-ended ways, it’s possible to build extensions that modify the behavior of an app in ways that the original creators never anticipated or intended to support.</p>
<p>Today I’ll make that point more concrete by sharing the story of a side project I made. Over the past couple years, I built a browser extension called <a href="https://tweethunter.io/twemex">Twemex</a> that helps people find interesting ideas on Twitter. Twemex started as a tiny utility to improve my own Twitter experience, but it grew to have tens of thousands of users, and ultimately I sold the extension to <a href="https://tweethunter.io/">Tweet Hunter</a> in a recent acquisition.</p>
<p>In this post, I’ll reflect on that experience and share some of the unique advantages (and tradeoffs) of building an extension instead of building an entire new app. I’ll also share a few of the tactics I used to create a successful extension.</p>
<p>Most importantly, I’ll argue that <strong>making an extension is a fun and efficient way to create useful software</strong>, especially when you can only invest limited time and effort. Instead of starting from a blank slate, you can start by tailoring some piece of existing software to better serve your own needs, and then share your work with other people facing similar problems. No need to brainstorm weird startup ideas or hunt for markets—just find limitations in the software you already use, and patch them yourself.</p>
<p>Beyond these benefits for the developer, extensions can be awesome for end-users too. Instead of needing to switch to a different app, <strong>users can smoothly integrate new functionality into their existing experience</strong>. Extensions can also support niche use cases that would never be prioritized by the original developer.</p>
<figure style="margin: 0;">
<img src="/images/article_images/gears.jpg" alt="Many gears interlocking">
<figcaption>Illustration from Midjourney</figcaption>
</figure>
<h2 id="the-beginning">The beginning</h2>
<p>In 2020, I was a pretty heavy Twitter <s>addict</s> user. I got really into a strange style of Twitter usage popularized by <a href="https://twitter.com/visakanv/status/1040450987754811392">@visakanv</a> and <a href="https://twitter.com/Conaw/status/1129806786477772801">@Conaw</a>. They would use Twitter as a sort of note-taking app or <em>memex</em>: weaving intricate webs of ideas, using Twitter’s quote-tweet and linking mechanisms for connecting related thoughts.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Building threads (with help from search) causes you to stumble on old ideas, which is a great way to have new ones.<br><br>You’re different now. <br><br>The past is a foreign country. <br><br>Your old self is a different person, but a great collaborator.<br><br>Surprise yourself.</p>— Conor White-Sullivan 𐃏🔍🇺🇸 (@Conaw) <a href="https://twitter.com/Conaw/status/1129808596890341376?ref_src=twsrc%5Etfw">May 18, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Using Twitter this way was sort of a hack, and it required some light creativity with the platform’s features. One particularly important one was finding old tweets using Twitter’s advanced search. For example, if I wanted to link to an older tweet I had written about spreadsheets, I would need to open a new tab and search <code>from:geoffreylitt spreadsheets</code>. I quickly learned that Twitter’s advanced search offered tons of useful capabilities like this, but they weren’t conveniently exposed.</p>
<p>I decided to take matters into my hands, and spent a day hacking on a simple little extension to improve the situation. It added a sidebar where I could type in a search term, and get immediate search results from my own past tweets as I typed. This way, I could easily find past tweets to link to in my own threads as I wrote them.</p>
<p><video src="/images/article_images/twemex/v0.mp4" autoplay=true controls=true loop=true /></p>
<p>The implementation was dead simple. All it did was prepend <code>from:<my username></code> to the beginning of the search term and send requests to the search API used by the web client. I found that the search API was fast enough to power a live search experience as the user typed, even though this live search UX wasn’t exposed anywhere in the Twitter client itself.</p>
<h2 id="launch">Launch</h2>
<p>After I used this tool for a few months and occassionally shared screenshots, a few people asked me if they could use it too. I shared the prototype, and through conversations with these early users, quickly got a bunch of ideas for more features to build on top of Twitter’s search.</p>
<p>I added a few widgets that would respond to the active browsing context and passively show interesting context, without any interaction needed from the user. The most exciting one was “Highlights”: a way to see the most-liked tweets from the account currently being viewed. This let you get a broader view of a new account, instead of just seeing their latest posts.</p>
<p><img height="500px" src="/images/article_images/twemex/highlights.png" /></p>
<p>I also added a richer search keyword language which simplified the Twitter search keywords and made it easy to incorporate the current browsing context: for example, <code>/me</code> would search your own tweets, and <code>/user</code> would search within the tweets of the user currently being viewed.</p>
<p>One feature I found ridiculously useful was <code>/follows</code>, which would search tweets from people you follow. This let me treat Twitter as a personal search engine, where I could see opinions from people I trusted about any topic. This keyword was simply a shortcut for an existing Twitter search keyword, <code>filter:follows</code>—but saving the few extra keystrokes made a big difference in usability.</p>
<p><img height="500px" src="/images/article_images/twemex/search.png" /></p>
<p>Once this initial feature set solidified enough, I got some buzz with a soft launch tweet, and expanded the beta to over 100 interested users. Everything was duct taped together: I posted the extension source on a Notion page and DM’d the link to people to sideload into their browsers.</p>
<p>In hindsight, this manual distribution strategy turned out to be a great idea, because <strong>DMs were the perfect way to gather early feedback</strong>. The majority of early users actually sent meaningful feedback, and I suspect it’s because we already had a casual messaging channel opened when I originally sent the app. I also found that informal DMs were an efficient way of gathering feedback compared to calls or formal emails; I could easily keep conversations going with dozens of users without much overhead.</p>
<h2 id="earning-the-pixels">Earning the pixels</h2>
<p>Now that I had more users, my #1 priority was to “earn the pixels”: that is, to make the extension feel native to Twitter, never cause glitches, and generally offer a high-quality experience. I wanted to make sure people never had a reason to disable the extension.</p>
<p>It turns out the bar for this is pretty high, because people have strong existing habits and expectations. I had to build my own copy of the Twitter UI for displaying tweets, and align it as closely as possible with the real UI. When Twemex was missing features like a retweet button, people would get confused because they expected things to behave just like the native UI.</p>
<p>Sometimes features that might be lower priority in a standalone app become critical in an extension. For example: consider dark mode. A brilliant white sidebar on a dark Twitter site looks ridiculous. Implementing color modes, and properly syncing with the user’s display preferences on the Twitter page, was a non-negotiable feature for Twemex.</p>
<p>I was only working on Twemex as a side project, so I had limited time on nights and weekends to fix all this stuff. For a while, almost my entire development budget was spent on quality and polish, with essentially no feature development. I think that was the right call for providing a nice experience though. Luckily I didn’t have any managers looking over my shoulder asking me to ship sellable features.</p>
<p>One thing that did help with earning the pixels in this case is that the real estate I was replacing on Twitter was the “What’s happening” sidebar, which I (and apparently many other users) found pointless and actively distracting.</p>
<p><img src="/images/article_images/twemex/useless.png" alt="" /></p>
<p>Anyway, eventually things stabilized, and I shipped a proper public beta through the Chrome store. At this point a lot of people started really loving the Twemex experience. I got dozens of reviews like:</p>
<blockquote>
<p>Rapidly became one of my core features when browsing Twitter. Cuts through the noise and finds quality so well.</p>
</blockquote>
<p>and</p>
<blockquote>
<p>I cannot believe how broken twitter feels <em>without</em> twemex</p>
</blockquote>
<p>and</p>
<blockquote>
<p>If you aren’t using @TwemexApp, you’re using the “flip phone” version of Twitter.</p>
</blockquote>
<p>Here’s a short demo video of Twemex if you want to see more of the cool things it does. Notice how throughout the demo, it really feels like the extension is just part of the app:</p>
<iframe width="100%" height="300px" src="https://www.youtube.com/embed/ORAFb0Lh7ys?start=2898" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
<h2 id="growth-acquisition">Growth + acquisition</h2>
<p>Once the extension shipped, I didn’t update it very much. I would occasionally fix bugs or make minor tweaks, but I didn’t have time for any more, since I was busy with my day job of doing <a href="/#projects">research on user agency in computing</a>. I also made a little website for the tool, but didn’t make any serious efforts at marketing…</p>
<p>…and yet, somehow, it kept growing on its own. Users were growing reliably around 10-15% per month; after a year or two, that had built up to over 20,000 users. (This is just Chrome’s report of the number of users who have the extension installed; I didn’t have analytics measuring any more detailed stats on activity.)</p>
<p><img src="/images/article_images/twemex/growth.png" alt="" /></p>
<p>From time to time I would think about getting more serious about the project, but I was busy with my research, and also wasn’t willing to invest serious time without some kind of compensation. Many users had told me they would pay $5/month for the tool, so I contemplated making the product paid, but didn’t love that option for users.</p>
<p>Then, in late 2022, a couple teams building Twitter-related products reached out to me with interest in acquiring the extension. I decided it was a good idea to sell, because a dedicated team could do a better job maintaining and growing the extension than I could with my spare time. I ended up selling it to <a href="https://tweethunter.io/">Tweet Hunter</a>.</p>
<p>I was particularly excited that they were longtime users of the extension and deeply understood its value. I was also happy that they planned to keep the existing functionality free, since they could use Twemex to help grow their existing product. Of course, the financial outcome was also helpful for me, since I’m currently a grad student foregoing a tech industry salary to do more speculative research.</p>
<h2 id="the-benefits-of-extensions">The benefits of extensions</h2>
<p>Looking back on my work on Twemex, I’m struck by how <em>fun</em> and <em>efficient</em> it was as a project.</p>
<p>I didn’t set out with ambitions of creating a widely used tool, I just started with a customization for myself. And I never gave Twemex much attention; it always remained a low-priority side project below other things. And yet, I was still able to create something valuable that other people benefited from. I credit a lot of these benefits to the fact that it was a browser extension rather than a standalone app.</p>
<p>Here are three key ways that extensions are nice for a side project:</p>
<h3 id="easy-to-find-an-idea">Easy to find an idea</h3>
<p>Transformational ideas for software—the ones that could become huge businesses and change the world—are rare and hard to spot. Even when they do work out, they often take tremendous effort and require an appetite for risk.</p>
<p>In contrast, <strong>incremental improvements to existing software are far easier to find</strong>. If you’re opinionated about software and have taste in design, every day spent in browser apps is guaranteed to yield a flood of small complaints, each of which could be the seed for an extension.</p>
<p>It’s also totally okay if the complaints are quite niche or specific. My starting point for Twemex came from an esoteric usage pattern, and even after I added some more generic features, it’s very far from having mass appeal. Twemex is used by something like 0.01% of Twitter’s overall userbase, and that’s perfectly fine.</p>
<p>Obviously, this line of thinking can only yield small improvements to existing tools, and it won’t lead to the next big revolutionary thing. But sometimes little tweaks can make a big difference, and I find this to be an appropriate and humble mentality for a small side project.</p>
<h3 id="easy-operations">Easy operations</h3>
<p>I had a strict rule for this project: <strong>no operational stress</strong>. This meant no servers, and no data storage.</p>
<p>The tool was shipped as a purely client-side browser extension, using Twitter’s backend for search. I didn’t have my own user accounts; the extension would just send requests from the user’s browser using their authentication credentials.</p>
<p>I also avoided building any features that would require storing data on my end. Data is a liability; it requires careful handling to preserve privacy, and to avoid data loss. If I had built data storage features, I probably would have tried a <a href="https://www.inkandswitch.com/local-first/">local-first</a> approach to avoid operational stress.</p>
<p>These rules made it far easier to keep the project running without investing much ongoing effort. Of course, these aren’t necessarily reasonable constraints for a larger or more serious project, but they worked for this one.</p>
<h3 id="easy-growth">Easy growth</h3>
<p>Getting people to use a new thing is hard, and getting them to keep using it is even harder.</p>
<p>The great news is, <strong>with an extension, the flywheel isn’t starting from scratch</strong>. Once a user installed Twemex, they would automatically see the new sidebar whenever they visited Twitter. For the power users who were the most likely to use Twemex anyway, this meant that their existing habits would seamlessly grow to include Twemex.</p>
<p>Very often, people would tell me they had forgotten that Twemex wasn’t part of Twitter itself. I think this also points to the benefits for users: would you rather have to learn a whole new interface, or just have your existing one seamlessly improved?</p>
<p>The flip side of this is that the extension has to earn the pixels. Any glitchiness or quality problems that would mess up the core Twitter experience would lead to an uninstall. Most of my time working on the extension went into making it feel native and removing any problems that would actively degrade from the experience.</p>
<p>In summary, building an extension gave me access to an easy idea, easy operations, and easy growth, relative to building a larger application. It still took careful design work and lots of iterations to reach a good product, but the leverage from the hours I put in was pretty high.</p>
<h2 id="the-drawbacks">The drawbacks</h2>
<p>I also found that there are some key tradeoffs to grapple with in extension development.</p>
<h3 id="platform-risk">Platform risk</h3>
<p>With any extension or plugin, there’s always platform risk since you’re building on top of someone else’s app. This could range from day-to-day instability, to getting completely shut out or replaced by a first-party feature. Twitter in particular has an infamous history of treating third-party developers poorly.</p>
<p>Interestingly, Twemex doesn’t actually use the official Twitter API, so it’s subject to a different kind of risk than official third-party apps. On the one hand, there’s no API key, and Twemex can access all the APIs used by the first-party client, which is super convenient. On the other hand, because it’s building on an unofficial, reverse-engineered foundation, there are no guarantees at all about when things might change underneath.</p>
<p>Luckily, I didn’t have too many issues, perhaps because Twitter didn’t change its core features very much while I was working on the tool. I generally tried hard to minimize the coupling of my UI and Twitter’s DOM, and did slightly fancy things in some things in service of reliability—for example, my code for detecting color themes from the site uses ranges of colors rather than exact hex values, to be resilient in case Twitter were to slightly tweak their colors.</p>
<p>In some ways though, maybe this platform risk can be an advantage for a side project. The platform risk might be too great to build a whole business on top of, but for a lower-stakes extension it’s fine.</p>
<p>Another thing worth mentioning is that it’s getting harder to engineer browser extensions well as web frontends become compiled artifacts that are ever further removed from their original source code. Semantic CSS classes are mostly gone these days; stably addressing UI elements is hard.</p>
<h3 id="the-chrome-extension-platform-is-flawed">The Chrome extension platform is flawed</h3>
<p>I’m used to building web applications where you can ship an update anytime, especially when something is broken.</p>
<p>In contrast, I found that distribution is miserable on the extension platform. Reaching most users requires going through the Chrome Web Store, which has an opaque manual review process that can take anywhere from a couple hours to a few weeks. Not being able to ship updates quickly meant I had to be far more diligent about QAing releases.</p>
<p>It seems like Chrome may be improving this situation recently but it’s hard to tell. I found it really helpful being in <a href="https://twitter.com/Taylor_Nieman/status/1298997634586222598">Taylor Nieman’s Slack channel for browser extension devs</a> to ask for advice and generally commiserate.</p>
<p>There’s also been a ton of angst recently around the move to a new extension format, Manifest V3, which has also been <a href="https://developer.chrome.com/blog/more-mv2-transition/">delayed</a> due to some of the turmoil. I won’t go into the details here, but the overall impression I get is of a platform with tremendous potential, but somewhat disorganized and neglected under current management.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Software should be a malleable medium, where anyone can edit their tools to better fit their personal needs. The laws of physics aren’t relevant here; all we need is to find ways to architect systems in such a way that they can be tweaked at runtime, and give everyone the tools to do so.</p>
<p>Beyond the pragmatic efficiency benefits of building a browser extension, I would argue that it’s <strong>simply more fun to engage with the digital world in a read-write way</strong>, to see a problem and actually consider it fixable by tweaking from the outside.</p>
<p>So, if you’re a programmer: the next time you come across an annoying problem on a web app frontend, maybe consider writing a browser extension to make it better, and then share it so that other people can benefit too.</p>
<h2 id="related">Related</h2>
<ul>
<li>I gave a talk called <a href="https://youtu.be/ORAFb0Lh7ys?t=2481">Using Twitter to Cultivate Ideas</a>, where I went into more depth on some of the philosophy behind Twemex.</li>
<li>I’ve written before about the <a href="/2019/07/29/browser-extensions.html">promise of browser extensions</a> as a platform, and democratizing the power of extensions</li>
<li>I’m excited about <a href="https://www.youtube.com/watch?v=53KQ2wUZG2s">Arc Boosts</a>, an attempt to integrate user scripting and customization deeply into a new web browser</li>
<li>A Twemex user recorded <a href="https://www.youtube.com/watch?v=FoDQnrx1ZpU">a nice demo</a> showing off and reviewing the extension in more depth</li>
</ul>
Dynamic documents // LLMs + end-user programming/2022/11/23/dynamic-documents.html2022-11-23T15:55:00+00:002022-11-23T15:55:00+00:00Geoffrey Litt<p><em>A brief note: This is a mirror of an email newsletter I sent out in November 2023, originally hosted <a href="https://buttondown.email/geoffreylitt/archive/dynamic-documents-llms-end-user-programming/">here</a>. I’m experimenting with cross-publishing to my personal site and through my email newsletter service, so that the long-term archive can be hosted...</em></p><p><em>A brief note: This is a mirror of an email newsletter I sent out in November 2023, originally hosted <a href="https://buttondown.email/geoffreylitt/archive/dynamic-documents-llms-end-user-programming/">here</a>. I’m experimenting with cross-publishing to my personal site and through my email newsletter service, so that the long-term archive can be hosted here.</em></p>
<hr>
<h1 id="potluck-dynamic-documents-as-personal-software">Potluck: Dynamic documents as personal software</h1>
<p>We recently published an essay about <a href="https://www.inkandswitch.com/potluck/">Potluck</a>, a research project I worked on together with <a href="https://twitter.com/mschoening">Max Schoening</a>, <a href="https://bypaulshen.com/">Paul Shen</a>, and <a href="http://paulsonnentag.com/">Paul Sonnentag</a> at <a href="https://www.inkandswitch.com/">Ink & Switch</a>.</p>
<p>You can read the essay here:</p>
<p><a href="https://www.inkandswitch.com/potluck/">Potluck: Dynamic Documents as Personal Software</a></p>
<p>Potluck originated from a desire to make it easier for people to build little pieces of personal software. We ended up building a prototype that enables people to gradually enrich text notes into interactive tools by extracting structured data from freeform text, running computations on that data, and then injecting the results back into the text as annotations.</p>
<p><img src="/images/article_images/potluck.png" alt="" /></p>
<p>You can use it to do all sorts of things, including scale recipe ingredients, keep track of household chores, or manage a meeting—lots more examples and details in the essay. We found that starting with familiar text notes seems to make it easier to think of places to use computation; instead of thinking “what app should I make?” you can just notice places in your notes app where you’d like a small extra bit of functionality.</p>
<p>I also talked about the project on two podcasts. Both were conversations with friends who are deeply thoughtful about end-user programming, which was a ton of fun:</p>
<ul>
<li><a href="https://museapp.com/podcast/67-dynamic-documents/">Metamuse, Episode 67</a>, with <a href="https://twitter.com/mschoening">Max Schoening</a> and <a href="https://adamwiggins.com/">Adam Wiggins</a></li>
<li><a href="https://overcast.fm/+9UfGKgvkQ/">Building Software Quickly, Episode 1</a>, a new podcast by <a href="https://maryrosecook.com/">Mary Rose Cook</a></li>
</ul>
<p>There’s also a <a href="https://www.inkandswitch.com/potluck/demo/">live demo</a> where you can play with Potluck. Expect prototype quality, but there is a tutorial and we were able to get quite far using the tool ourselves.</p>
<p>Also, since the essay release a couple people have been <a href="https://twitter.com/akkartik/status/1589011612408897537">building their own</a> implementations of Potluck, which has been neat to see.</p>
<p>We’re not planning on developing this particular prototype any further, or turning it into a product or anything. But we do plan to carry the lessons we learned from this prototype into future computing environments we’re thinking about at Ink & Switch. (One main reason for this approach is that Potluck really works better as an OS feature more than an isolated app.)</p>
<p>Anyway, if you have thoughts or feedback on this work, just reply here—even a single sentence or few bullet points is appreciated! I’d love to try to spark more conversations over email with these newsletters.</p>
<h1 id="llms-end-user-programming">LLMs + end-user programming</h1>
<p>2022 has been the year of large language models. I think the hype is justified; the tech is already good enough to do lots of useful stuff and the rate of progress is astounding. I’m trying my best to keep up with all the changes and think about how AI might fit in for end-user programming. Here are some messy reflections.</p>
<p>In my daily programming work, I use GitHub Copilot every day, and it’s almost weird to program without it now. I also regularly ask GPT-3 more open-ended questions about programming and other subjects and find it more useful than Google / Stack Overflow in many cases. Three years ago I was confident that it’d take a very long time for language models to become useful for programming, and I expected traditional program synthesis methods to preserve their edge for a while. I’ve now learned my <a href="http://www.incompleteideas.net/IncIdeas/BitterLesson.html">Bitter Lesson</a>.</p>
<p>If you haven’t been following along with the progress, I recommend getting some intuition by skimming <a href="https://twitter.com/goodside">Riley Goodside’s Twitter feed</a> and then replicating some of his examples on the free <a href="https://beta.openai.com/playground">GPT-3 Playground</a>. The latest GPT-3 model is remarkably good—if, big if!—you know how to talk to it properly. It really does resemble soft human reasoning, although there are still glaring and sometimes hilarious flaws to watch out for.</p>
<p>There are clearly some massive opportunities for AI to help out with supporting non-programmers in building and tailoring software. One of the biggest barriers to making computers do stuff is learning to write code in traditional programming languages, with their rigid reasoning and delicate syntax. People have been thinking for a long time about ways to get around this. For many decades people have been chasing the dream of having the user just demonstrate some examples, and automatically creating the program:</p>
<blockquote>
<p>The motivation behind Programming by Demonstration is simple and compelling: if a user knows how to perform a task on the computer, that should be sufficient to create a program to perform the task. It should not be necessary to learn a programming language like C or BASIC. Instead, the user should be able to instruct the computer to “Watch what I do”, and the computer should create the program that corresponds to the user’s actions.</p>
</blockquote>
<p>-Allen Cypher in <a href="http://acypher.com/wwid/FrontMatter/index.html#Introduction">Watch What I Do: Programming by Demonstration</a>, published in 1993 (!)</p>
<p>Progress on this had been quite slow, though—it turns out that generalizing well from examples is a really hard problem. <a href="https://support.microsoft.com/en-us/office/using-flash-fill-in-excel-3f9bcf1e-db93-4890-94a0-1578341f73f7">FlashFill</a> in Microsoft Excel was a major breakthrough for the program synthesis field—a big commercial deployment of programming-by-example!—but was still limited to quite small problems.</p>
<p>With AI on the scene, there’s been sudden progress. In particular, natural language is now viable as a specification language, and even providing examples is optional in some cases! But there are still tricky questions to answer about how the tech best fits in.</p>
<h2 id="tools-vs-machines">Tools vs machines</h2>
<p>I think there’s a blurry but useful distinction to be drawn between “tools” and “machines”:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">On “tools” vs “machines”:<br><br>“the power loom is a machine, and its significance as a destroyer of culture lies in the fact that it does the essentially human part of the work” <a href="https://t.co/2AfnerYBgl">pic.twitter.com/2AfnerYBgl</a></p>— Geoffrey Litt (@geoffreylitt) <a href="https://twitter.com/geoffreylitt/status/1216750194886422531?ref_src=twsrc%5Etfw">January 13, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>When it comes to AI, I’m much more interested in using AI to amplify human capabilities than I am in cheaply automating tasks that humans were already able to do. In their essay <a href="https://distill.pub/2017/aia/">Using Artificial Intelligence to Augment Human Intelligence</a>, Shan Carter and Michael Nielsen call this idea “Artificial Intelligence Augmentation”, or AIA. I think it’s a nice phrase to describe the goal. (Another good read on this topic: <a href="https://jods.mitpress.mit.edu/pub/issue3-case/release/6">How To Become A Centaur</a>, by Nicky Case)</p>
<p>The general vibe of AIA is: human in the driver seat, precisely wielding a tool, but supported by AI capabilities. For example, Carter and Nielsen use the example of moving a slider to change the boldness of a font, where “boldness” is a dimension that a machine learning model learned from looking at many existing fonts. Notably, the user interaction is fast and direct-manipulation; not at all like “talking to a human over a chatbot.”</p>
<p><a href="https://thesephist.com/">Linus</a> recently posted some demos of dragging a slider to change the length or emotional tone of a text summary, which I think has a similar sense of “tool” rather than “machine”:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">NEW DEMO!<br><br>Exploring the “length” dimension in the latent space of a language model ✨<br><br>By scrubbing up/down across the text, I’m moving this sentence up and down a direction in the embedding space corresponding to text length — producing summaries w/ precise length control (1/n) <a href="https://t.co/oclIOvpvlN">pic.twitter.com/oclIOvpvlN</a></p>— Linus (@thesephist) <a href="https://twitter.com/thesephist/status/1587929014848540673?ref_src=twsrc%5Etfw">November 2, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>We didn’t use LLMs in Potluck, but it’d be a natural extension, as we discuss briefly in the <a href="https://www.inkandswitch.com/potluck/#future-work">Future Work</a>. There, the AI could help with extracting structured data from messy raw text data, but still leave the user in control of deciding what kinds of computations to run over that structured data. I think this is a nice split because it lets AI do the thing that traditional code is terrible at, but doesn’t try to automate away the entire process. It’s also exactly the split that Nardi, Miller and Wright envisioned when they <a href="https://artifex.org/~bonnie/pdf/Nardi_program_agents.pdf">invented data detectors</a> at Apple:</p>
<blockquote>
<p>We tried to find a middle ground by using explicit representations of user-relevant information as a means of identifying actions users might wish to take but to leave the choice of these actions to users.</p>
</blockquote>
<p>But! I think it can be subtle sometimes to tell the difference between tool and machine. Recently I’ve been using a wonderful video editing app called <a href="https://www.descript.com/">Descript</a> which lets you edit a video by editing the text transcript. This is clearly a tool that amplifies my abilities; I’m an order of magnitude faster at editing talks when I’m using this software, and it enables <a href="https://twitter.com/geoffreylitt/status/1572736140154933253">entirely new kinds of workflows</a>. But it’s also built on top of a capability that seems quite machine-y: automatically transcribing a video into text, a task which used to require lots of human effort.</p>
<p>Maybe the Descript example suggests “automating away the tedious part” is a reliable recipe for making tools that support human abilities, but it’s not obvious to me what counts as the tedious part. If I write a one sentence summary that gets auto-expanded into a whole blog post, is that a tool or a machine? I have instinctive opinions on these things, but I worry about trusting those instincts too much; I don’t want to be an old guy arguing against calculators in math classes.</p>
<h2 id="interpreter-vs-compiler">Interpreter vs compiler</h2>
<p>It seems like there’s two main ways to use an LLM:</p>
<ul>
<li>“AI as fuzzy interpreter”: Give instructions, just have the AI directly do stuff for you</li>
<li>“AI as compiler”: Have the AI spit out code in Python, JavaScript, or whatever language, that you can run</li>
</ul>
<p>There are serious tradeoffs here. The AI can do soft reasoning that’s basically impossible to do in traditional code. No need to deal with pesky programming, just write instructions and let ‘er run. On the other hand, it’s harder to rely on the AI—what will happen when you give it new inputs, or when the model changes? It’s also (for now) slower and more expensive to run AI inference than to run traditional code. (Andrej Karpathy’s <a href="https://karpathy.medium.com/software-2-0-a64152b37c35">Software 2.0</a> blog post covers some of these tradeoffs in more depth.)</p>
<p>Here’s a great example of the “just use the AI” approach: entering GPT prompts directly as “formulas” into Google Sheets. Works great, looks like magic:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">This weekend I built =GPT3(), a way to run GPT-3 prompts in Google Sheets.<br><br>It’s incredible how tasks that are hard or impossible to do w/ regular formulas become trivial.<br><br>For example: sanitize data, write thank you cards, summarize product reviews, categorize feedback… <a href="https://t.co/4fXOTpn2vz">pic.twitter.com/4fXOTpn2vz</a></p>— Shubhro Saha (@shubroski) <a href="https://twitter.com/shubroski/status/1587136794797244417?ref_src=twsrc%5Etfw">October 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>But also, a bunch of the results were wrong 🙃:</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">This is awesome. I also love how 20% of the zip codes are wrong. Messy AI future seems really fun and chaotic</p>— Daniel Li (@danielxli) <a href="https://twitter.com/danielxli/status/1587198402181599232?ref_src=twsrc%5Etfw">October 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>I’m very curious where we’ll see each of these two techniques used, and what hybrids will emerge.</p>
<p>I expect reliability to improve dramatically over the coming years as models and prompting techniques get more mature, but the last 5-10% will be really hard. Any place where 90% accuracy is good enough will be fair game for directly running AI (notably, this probably means replacing any workflow where human error was already expected and accounted for), but 99%+ accuracy will probably benefit from code generation for a while. I suspect this means that in any domain where code is <em>already being used</em> today, code will remain dominant for a while, even if it’s increasingly AI-generated.</p>
<p>Reliability aside, it also seems like there’s a lot of benefit in having code as a clearly structured artifact that human and AI can iterate on together. If people aren’t writing code manually as much, that means that programming languages could evolve towards being easier to read and edit than write from scratch.</p>
<p>One of my favorite interaction ideas in this area comes from a traditional synthesis paper, <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2015/11/uist15.pdf">User Interaction Models for Disambiguation in Programming by Example</a>. The idea is that the AI generates a program based on user specification, and then shows the user a description of the program written in natural language syntax. It also shows alternative code that it <em>could have written</em> within various parts of the program, and lets the user directly choose between the options:</p>
<p><img src="/images/article_images/uist15.png" alt="" /></p>
<p>I like how this engages the human in reasoning clearly and directly about the desired behavior, while still getting a lot of help from fuzzy reasoning. It’ll be interesting to see how programming languages evolve to support easier reading, editing, and verification, as opposed to always writing from scratch…</p>
<p>Alright, I’m out of time for now. I’m thinking a lot about this topic these days so I’ll plan to share more soon, and would love to hear from you if you have thoughts on any of this.</p>
Bring Your Own Client/2021/03/05/bring-your-own-client.html2021-03-05T02:00:00+00:002021-03-05T02:00:00+00:00Geoffrey Litt<p>It’s delightful to have the freedom to <strong>Bring Your Own Client (BYOC)</strong>: to choose your favorite application to interact with some data.</p>
<p>For example, I can program with Sublime Text, while my teammate uses vim, and we don’t need to fight to the death...</p><p>It’s delightful to have the freedom to <strong>Bring Your Own Client (BYOC)</strong>: to choose your favorite application to interact with some data.</p>
<p>For example, I can program with Sublime Text, while my teammate uses vim, and we don’t need to fight to the death to pick one editor between us. There are dozens of text editors to choose from, and no lock-in from proprietary file formats.</p>
<p>Contrast this with Google Docs: in order to live collaborate with each other, we all need to use the same editor. For someone who spends their whole working day in Google Docs, this can be a serious limitation. I personally hate doing substantial writing in Google Docs.</p>
<p>In cloud apps, the live collaboration logic is usually coupled to a specific editor; even if Google wanted to expose an API for editing Google Docs in third-party editors, it would probably be very challenging. The situation is nicer with text editors and git, because editing is decoupled from collaboration logic. Our team only needs to agree on a version control solution, which exposes a simple API (local text files) that many editors can interact with.</p>
<p>To be fair, local vs cloud isn’t the only factor here—even in local software, collaborators are often forced to converge on a single proprietary client (Microsoft Office, Adobe suite); conversely, a cloud service can support a third-party client ecosystem with the right APIs and attitude. Still, cloud apps exacerbate the problem. With local files, there’s some default openness built in; even proprietary file formats can be reverse-engineered. With cloud apps, the default is a single official client, unless the service actively exposes an API (and doesn’t shut it down—looking at you, Twitter).</p>
<p>It seems like local-first software is a good foundation for promoting Bring Your Own Client more broadly. What would it look like to have a thriving ecosystem of third-party clients for Google Docs style word processing, which can all interoperate with each other, even supporting realtime collaboration?</p>
<h2 id="concrete-examples">Concrete examples</h2>
<p>Some successful existing examples of client ecosystems built around open standards:</p>
<ul>
<li>text editors / IDE</li>
<li>RSS readers</li>
<li>email clients</li>
<li>web browsers</li>
</ul>
<p>Places where I want to have BYOC:</p>
<ul>
<li>Google Docs. I wish I could write this very doc in my preferred editor, locally, but have also support for inline comments and live collaboration. Might it be possible to build a VSCode extension that edits Google Docs live? (Tricky, because Google doesn’t have a nice API to integrate with, but maybe doable)</li>
<li>Google Slides</li>
<li>Figma</li>
<li>Notion</li>
<li>Trello / Asana / shared todo lists</li>
<li>multiplayer code editor: live collaboration as in repl.it</li>
</ul>
<h2 id="finer-granularity">Finer granularity</h2>
<p>Today we generally think about BYOC at the “app” level. But can we go finer-grained than that, picking individual interface elements?</p>
<p>Instead of needing to pick a single email client, can I compose my favorite email client out of an inbox, a compose window, and a spam filter?</p>
<h2 id="problems-questions">Problems / questions</h2>
<ul>
<li><strong>Schema compatibility</strong>: do all the editors need to agree on a single rigidly specified format? If there are reconcilable differences between formats, can we build “live converters” that convert between them on every change? (Essentially, imagine collaborating between Pages and Microsoft Word, running a file export in both directions on every keystroke from either app) This problem is closely related to the problem of schema versioning within a single editor, but BYOC can complicate things much further.</li>
<li><strong>Preserving intent</strong>: the decoupling of git + text editors has a downside: the text format fails to capture the intent of edits, so git can’t be very smart about merging conflicts. Is this something fundamental to decoupling editors from collaboration? Or are there ways to design APIs that preserve intent better, while also supporting an open client ecosystem? (It seems like deciding on how you store your data in a CRDT is the key question here?)</li>
<li><strong>Additional editor-specific metadata</strong>: Some editors need to store additional data that isn’t part of the “core data model.” Eg, Sublime Text stores my <code>.sublime-workspace</code> file alongside the code source. How does this work smoothly without polluting the data being used by other editors?</li>
<li><strong>Code distribution</strong>: Traditionally code distribution happens through centralized means, but could code be distributed in a decentralized way alongside documents? If we’re collaborating together in a doc, can I directly share a little editor widget/tool that I’m using, without needing to send you a Github link? This might be overcomplicating things / orthogonal to the general idea here… (This idea inspired by <a href="https://webstrates.net/">Webstrates</a>, linked below)</li>
<li><strong>Innovation</strong>: Unfortunately stable open formats can limit product innovation—eg, email clients are tied down by adherence to the email standard. Can we mitigate that effect? I think web browsers have struck a good balance between progress and openness, despite frustrations in both directions.</li>
</ul>
<h2 id="addendum-faq">Addendum: FAQ</h2>
<p><em>Edited 2020-03-05: This post unexpectedly got popular on HN. As I drink my morning coffee, I’ll briefly respond to a few themes from the comments here.</em></p>
<p><strong>Q: Don’t standards make it harder to innovate?</strong></p>
<p>A: Yes, that’s a major challenge. For example, email and IRC have lagged behind Slack and Reddit, because it’s hard to change standards. We discussed this problem a bit in the <a href="https://www.inkandswitch.com/cambria.html#mastodon-protocol-evolution">Cambria paper, re: Mastodon</a>.</p>
<p>I think the key is to aim for more flexible and extensible standards: a useful 80% compatibility, rather than a perfect 100%.</p>
<p>Of course, once you abandon an exact standard, it’s easy to rack up tons of complexity. (I think the Semantic Web struggled with this problem trying to provide schema flexibility.) So we also need better tools to make partial compatibility easy to reason about, for both developers and users.</p>
<p><strong>Q: Hmm. 80% compatibility sounds like kind of a buggy mess? Word and OpenOffice don’t interop very well.</strong></p>
<p>A: I think with the right foundational tech for helping devs build maximally compatible formats, we can avoid the worst problems of incorrect format conversions. In the Cambria paper we sketched <a href="https://www.inkandswitch.com/cambria.html#lenses-in-action">a few examples</a> of partial compatibility, where Cambria guaranteed type safety and helped us easily avoid bugs.</p>
<p>That does leave a substantial design problem, though: even if everything works correctly, what do you show the <em>user</em> when two pieces of software aren’t fully compatible? How do you tell a user that their actions might show up differently for collaborators using different apps? I’m thinking a lot about these questions…</p>
<p><strong>Q: Cloud business models are so entrenched. Can this actually happen without government intervention?</strong></p>
<p>A: It’s true that business incentives are a major challenge. Maybe some form of government intervention could help, but ultimately it’ll be fighting a headwind unless users and devs are excited for the change.</p>
<p>I think the most sustainable way to make progress is to make BYOC the most convenient option, for the typical user and the typical developer. On the desktop, it’s convenient for a developer to work with the user’s existing filesystem. On the web today, there’s no user-controlled filesystem, so it’s usually easiest to just put the data in a database, and add a ticket to the backlog for someday building a public-facing API. How would that change if we had a convenient user-controlled place to put data?</p>
<p>See the <a href="https://www.inkandswitch.com/local-first.html">local-first software</a> article by Ink & Switch for some ideas on how new data architectures can make the right thing the easy thing, for both users and devs.</p>
<h2 id="prior-art">Prior Art</h2>
<ul>
<li><a href="https://webstrates.net/">Webstrates</a> has some great demos of this philosophy. It uses a centralized server for the live sync.</li>
<li>Webstrates descends from Michel Beaudouin-Lafon’s work on <a href="https://youtu.be/ntaudUum06E?t=727">instrumental interfaces</a>—"polymorphic" tools that can operate in different applications. For example, a color picker that I can use in any app.</li>
<li>The <a href="https://solidproject.org/">SOLID</a> decentralized web project has some closely related ideas: <a href="https://ruben.verborgh.org/blog/2017/12/20/paradigm-shifts-for-the-decentralized-web/#apps-become-views">“apps become views”</a>, creating a competitive marketplace of clients decoupled from data silos. In turn it’s heavily inspired by ideas from the Semantic Web.</li>
<li><a href="https://mashable.com/2009/05/28/google-wave-guide/">Google Wave</a> had some related ideas… A platform for realtime collaboration, with a rich open <a href="https://youtu.be/v_UyVmITiYQ?t=4207">extension API</a> intended for people to build various collaboration clients on top of. Seems like the common wisdom on why it failed is that it was <a href="https://gizmodo.com/what-in-the-hell-was-google-wave-trying-to-be-anyway-1835038967">too complicated</a> and tried to do too much.</li>
<li><a href="https://braid.news/">Braid</a> is exploring ways to extend HTTP to support collaborative editing across diverse clients.</li>
</ul>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li>I believe one piece of the puzzle here is declarative schema mapping, for example the <a href="https://www.inkandswitch.com/cambria.html">Cambria</a> project I worked on recently.</li>
<li>Granular BYOC starts to look like <a href="/2020/07/19/tools-over-apps-for-personal-notetaking.html">software as curation</a>: assembling software out of smaller “extensions”</li>
<li>Also relates to document-centric computing ideas like OpenDoc. Some <a href="https://twitter.com/geoffreylitt/status/1362779218241855494">recent notes</a> I took on why that failed…</li>
<li>Part of the solution may involve extracting and synchronizing data from cloud services without going through official APIs, as demonstrated in my <a href="/wildcard">Wildcard</a> project.</li>
</ul>
<h2 id="im-working-on-this">I’m working on this!</h2>
<p>I’m currently pursuing a PhD at MIT doing research on this topic. Lots of challenges and open questions ahead, but I have some ideas for how to make progress. I’m particularly excited about clever ways to incrementally nudge us from the status quo to a BYOC world, rather than reinventing everything.</p>
<p>If you want to follow along with future updates, you can subscribe via the links below.</p>
<p>And if you have ideas about this topic or want to chat, feel free to <a href="mailto:gklitt@gmail.com">get in touch</a>.</p>
<h2 id="ps-idea-incubation">PS: idea incubation</h2>
<p>I actually wrote this note 10 months ago and had totally forgotten about it.</p>
<p>An hour ago, I randomly came across it and was quite amused. It includes some ideas which I <em>thought</em> I had started thinking about only recently. But it turns out they’ve been incubating in my mind for a long time. Funny how that works!</p>
How (and why) to apply to CS PhD programs from industry/2020/09/27/phd-from-industry.html2020-09-27T15:00:00+00:002020-09-27T15:00:00+00:00Geoffrey Litt<p><strong>Oops, this is a draft article that accidentally got published to my RSS feed. Real version coming soon, but enjoy the outline for now! (and <a href="gklitt@gmail.com">email me</a>) if there are topics you want me to cover</strong></p>
<p>If you’re an early career software engineer considering...</p><p><strong>Oops, this is a draft article that accidentally got published to my RSS feed. Real version coming soon, but enjoy the outline for now! (and <a href="gklitt@gmail.com">email me</a>) if there are topics you want me to cover</strong></p>
<p>If you’re an early career software engineer considering grad school, you’re probably wondering: 1) is it a good idea? 2) how do I get in?</p>
<p>Unfortunately, in my experience, it’s hard to find advice on this rather niche question. There’s lots of advice on grad school written by academics, but most of it seems to be for hotshot undergrads dreaming about grad school from day 1 of freshman year. Conversely, most people in industry don’t really know what grad school is like, so they can’t give good advice either.</p>
<p>I found this lack of guidance quite challenging a couple years ago when I was applying to PhD programs after five years working as a professional software engineer. But it worked out for me in the end—I applied to eight competitive programs, was accepted to six, and I’m now a very happy second year PhD student at MIT CSAIL.</p>
<p>This is the guide I wish I had when I was applying. It’s specifically targeted at people who have spent some time in industry and are wondering whether they should do a PhD, and how to get in. Keep in mind, as with most advice, this is all based on my n=1 personal experience, and I’m only a year into this whole endeavor, so you should carefully balance my thoughts here with other sources (some linked at the bottom of this post).</p>
<h2 id="its-just-a-job-application">It’s just a job application</h2>
<ul>
<li>#1 misconception: PhD is some weird mythical thing. yes it’s odd, but I find it helpful to think of it as a JOB.
<ul>
<li>It’s a weird job, but it’s closer to a job than school.</li>
<li>footnote: masters is a whole different beast.</li>
</ul></li>
<li>I’ll analyze whole thing from this perspective.</li>
<li>Should you apply:
<ul>
<li>long-term, low paid job, with weird perks</li>
<li>consider the costs</li>
<li>do you care enough about the perks</li>
</ul></li>
<li>How to get in
<ul>
<li>it’s just a weird job application.</li>
<li>play the game. as usual, referrals matter</li>
</ul></li>
</ul>
<h2 id="should-you-apply">Should you apply?</h2>
<h3 id="know-your-motivation">Know your motivation</h3>
<ul>
<li>tell my panorama story</li>
<li>the pluses</li>
<li>advisor fit is everything</li>
</ul>
<h3 id="take-the-pay-cut">Take the pay cut</h3>
<ul>
<li>it’s not that bad. good to learn to live on less money.</li>
</ul>
<h2 id="how-to-get-in">How to get in?</h2>
<h3 id="play-the-game">Play the game</h3>
<ul>
<li>it’s not school</li>
<li>cite guo</li>
</ul>
<h3 id="play-up-your-competitive-advantage">Play up your competitive advantage</h3>
<ul>
<li>remember, in most jobs, new grads are at a huge disadvantage!</li>
<li>you don’t need EXACT transfer, think about what “research” really is</li>
</ul>
<h3 id="demonstrate-concrete-potential">Demonstrate concrete potential</h3>
<ul>
<li>do some research, demonstrate your skills</li>
</ul>
<h2 id="related-reading">Related reading</h2>
<ul>
<li>guo</li>
<li>matt might</li>
<li>jean yang</li>
</ul>
How tweet threads cured my writer's block: Twitter as a medium for sketching/2020/09/21/twitter-and-media-for-sketching.html2020-09-21T14:06:00+00:002020-09-21T14:06:00+00:00Geoffrey Litt<figure style="margin: 0;">
<img src="/images/article_images/sketching.jpg" alt="A hand sketching on a notebook">
<figcaption>Photo by <a class="figure-link" href="https://unsplash.com/photos/8DTIQ_Klxho">Keith Pitts on Unsplash</a></figcaption>
</figure>
<p>Writing is…hard. Like most aspiring bloggers, my folders of drafts and my dreams of future prolificness outweigh my actual output.</p>
<p>I’ve found a curious trick for getting over this hurdle, though: writing tweet...</p><figure style="margin: 0;">
<img src="/images/article_images/sketching.jpg" alt="A hand sketching on a notebook">
<figcaption>Photo by <a class="figure-link" href="https://unsplash.com/photos/8DTIQ_Klxho">Keith Pitts on Unsplash</a></figcaption>
</figure>
<p>Writing is…hard. Like most aspiring bloggers, my folders of drafts and my dreams of future prolificness outweigh my actual output.</p>
<p>I’ve found a curious trick for getting over this hurdle, though: writing tweet threads. I’ve published many little bursts of tweets about topics I’m curious about:</p>
<ul>
<li><a href="https://twitter.com/geoffreylitt/status/1272542423001022467">connections between change in physical architecture and software</a></li>
<li><a href="https://twitter.com/geoffreylitt/status/1250443671020986368">pondering Airtable’s macro system</a></li>
<li><a href="https://twitter.com/geoffreylitt/status/1177607448682582016">the danger of “app-ifying” spreadsheets</a></li>
<li><a href="https://twitter.com/geoffreylitt/status/1258769298862100483">digging into the history of Applescript</a></li>
</ul>
<p>These are exactly the kinds of things I’d like to blog about! But somehow, I’ve found it 10 times easier to publish the tweet threads.</p>
<p>I can hear you groaning already. Of course tweeting is easier than writing, you dummy! Our minds are being driven into the meat grinder 280 characters at a time, as we replace deep logical thought with aphorisms and memes. Twitter is PowerPoint thinking on steroids.</p>
<p>But I think this dismissive response misses the point. We can’t really understand Twitter by treating it as a mediocre replacement for essays and research papers. We need to see it as a new medium on its own terms. In particular, <strong>Twitter is a medium for <em>sketching</em></strong>—for playing with ideas, on the fly. Twitter is more similar to scribbling on a whiteboard or tossing ideas around at the cafe than writing a book. (By sketching I don’t mean literally just drawing; I mean any lightweight early expression of a thought.)</p>
<p>Why does Twitter work so well for this? Here are some of my theories:</p>
<ul>
<li><strong>The right constraints</strong>: Good sketching tools provide the right limits on what you can do. Twitter’s constraints go beyond the obvious character count.</li>
<li><strong>Low barriers</strong>: Twitter makes it easy to get started. But, crucially, it also makes it easy to <em>finish</em>!</li>
<li><strong>A social context</strong>: Twitter provides a highly interconnected context for thinking. Should we be worried it takes it too far?</li>
</ul>
<p>By reflecting on these properties, I think we can gain some insight not just into Twiter specifically, but also the broader landscape of tools for thinking. Let’s dive in.</p>
<h2 id="the-right-constraints">The right constraints</h2>
<p>Thinking about big new things is hard, and our brains are good at finding ways to weasel out of the job by finding something easier to do, but still plausibly productive. Unfortunately, in the early stages of sketching out an idea, such distractions abound: worrying about word choice in the last paragraph instead of writing the next one, futzing with the font size, making a new blog system instead of writing the damn blog post.</p>
<p>We can try to avoid these temptations, but an easier route is to simply find tools that don’t allow have the temptations in the first place. <strong>This is a key property of good sketching tools: they provide the right constraints.</strong></p>
<h3 id="tiny-linear-atomic-outlines">Tiny, linear, atomic outlines</h3>
<p>Let’s examine a few of Twitter’s valuable constraints.</p>
<p>Start with the obvious one: <strong>the 280 character limit</strong>. Twitter’s main constraint is encouraging concision. It’s hard to dwell on word choice when you have so little space to work with. Twitter’s conversational tone also helps here—I can just <a href="http://www.paulgraham.com/talk.html">write like I talk</a>, and any fancy words would seem out of place. And of course, I can’t tweak fonts and margins, which cuts off a distraction vector.</p>
<p>But threads complicate the story of character limits a bit. The limit isn’t really that your entire point must fit into one tweet—it’s that <strong><em>each</em> of your individual points must squeeze under the limit</strong>. This provides a different useful constraint: each idea has to be wrapped in a little atomic package. I find this helpful for figuring out the boundaries between my thoughts and clarifying the discrete units of an argument.</p>
<p>That constraint sort of resembles the benefits of an outlining tool. But Twitter has another constraint: <strong>a thread is linear</strong>! No indenting allowed. This forces a brisk straightline through the argument, instead of getting mired in the fine points of the sub-sub-sub-arguments of the first idea. Very limiting, but simultaneously freeing.</p>
<p>Taken together, these constraints frame the pros and cons of the medium, its appropriate range of usage. Obviously, writing a book in a single-level outline would be foolish, but it works for a rough sketch. More interestingly, I think Twitter is useless for persuading a skeptical reader; there’s simply not space for providing enough detail and context. This is a common property of media for sketching: the initial mockup isn’t impressive enough to sway a user, even if it’s a useful tool for the internal team. I prefer to use Twitter as a way to workshop ideas with sympathetic parties who already have enough context to share my excitement about the ideas.</p>
<p>Perhaps there’s a general principle here: Twitter is good for sketching ideas for the same reasons it’s bad for fully developing them. You can’t accidentally start writing a book in Twitter, and that’s kind of the point.</p>
<h3 id="the-puzzle-of-constraints">The puzzle of constraints</h3>
<p>In general, what are the <em>right</em> constraints for a sketching tool? I think this question is deeper than it seems at first glance.</p>
<p>You might say something like “only offer the minimum fidelity needed to convey the point,” but I think it’s not obvious how to define that minimum level. <a href="https://basecamp.com/shapeup/1.3-chapter-04#fat-marker-sketches">Sketching with a fat marker</a> can prevent us from getting too detailed with our drawings; <a href="https://museapp.com/">Muse</a>, which I’ve been using for iPad sketching recently, intentionally limits your ink choices to just a few colors.</p>
<p>This works great for certain kinds of thinking and mockups. But for designing new interactions with animation and physics, we need <a href="http://notebook.maryrosecook.com/Prototypingtools.html">a totally different class of tools</a> with more capabilities! The line between essential and spurious depends on the goal.</p>
<p>Providing the right constraints isn’t always a matter of removing. It can require adding advanced capabiliites too, like this <a href="https://distill.pub/2017/aia/">typeface design tool</a> that uses fancy machine learning to provide a few simple knobs for controlling things like “bold.” It doesn’t let you move individual vector points, but instead lets you operate at a more natural level of abstraction.</p>
<p>If you’re not careful, constraints can easily damage fluidity—as <a href="http://gordonbrander.com/pattern/brick-pencil/">Engelbart showed</a>, tying a brick to a pencil does not yield a productive tool.</p>
<figure>
<img src="/images/article_images/engelbart-brick-pencil.jpg" />
<figcaption>Surprise, surprise: writing with a “brick pencil” makes it harder to think big thoughts</figcaption>
</figure>
<p>Overall, it seems that we want constraints that help keep us on track with fluid thought, but don’t rule out too many interesting possibilities. Considering both of these criteria together is a subtle balancing act, and I don’t see easy answers.</p>
<h2 id="low-barriers">Low barriers</h2>
<p>There’s a <strong>low barrier to starting</strong> on Twitter. Just click a button, type a thought, no need to spend a minute remembering how to start my blog server. Often, that first minute of friction is enough to prevent me from getting into the flow of writing.</p>
<p>But the more interesting phenomenon is the <strong>low barrier to finishing</strong>. On Twitter, a single sentence is a completely acceptable unit of publication. Anything beyond that is sort of a bonus. In contrast, most of my blog posts go unpublished because I fear they’re not complete, or not good enough in some dimension. These unpublished drafts are obviously far more complete than a single tweet, but because they’re on a blog, they don’t feel “done,” and it’s hard to overcome the fear of sharing.</p>
<p>This seems like a crucial part of sketching tools: when you make a sketch, it should be understood that your idea is immature, and feel safe to share it in that state. <strong>There’s a time and a place for polished, deeply thorough artifacts… and it’s not Twitter!</strong> Everyone knows you just did a quick sketch.</p>
<p>I believe that quantity leads to quality. <a href="https://kottke.org/09/02/art-and-fear">The students who make more pots in ceramics class improve faster than the students who obsess over making a single perfect pot</a>. A tool with a built-in low barrier to finishing makes it easier to overcome the fear, do more work, and share it at an earlier stage.</p>
<h2 id="a-social-context">A social context</h2>
<p>In my experience, sketching always requires a delicate dance between individual thought and collaboration. You sketch to clarify something for yourself, but also to communicate with others. I think a good sketching medium should account for both halves of the process.</p>
<p>Writing a blog can feel like a lonely one-way mirror: release something into the world, maybe get a few comments back and some Hacker News snark. In contrast, <strong>Twitter is a bazaar, buzzing with activity</strong>. The engagement ratio is totally different. You can easily have micro-conversations around individual points in a thread. When the same people start showing up time after time, it starts to feel like <a href="https://twitter.com/simonsarris/status/1270744181313503234">seeing acquaintances in a village</a>. On Twitter, I write for my Twitter friends, not for some amorphous crowd.</p>
<p>At its best, this engagement leads to the kind of back-and-forth that characterizes my favorite kinds of sketching sessions. Ideas are in the air, it’s not clear where they came from really, they combine to form new ones in realtime. For me, Twitter does an oddly good job at simulating the thrilling creative energy of a whiteboarding session. People pop in and out of the conversation offering insights; trees and sub-trees form riffing off of earlier points.</p>
<p>Of course, my feeling of safety here presumes healthy engagement from other parties, a privilege not enjoyed by all. I suppose it’s kind of odd that such a globally public medium is suitable at all for sketching—it seems only possible because I’ve found safe, trusting mini-communities, defined by informal and permeable boundaries. Perhaps a more private Twitter would be even better for sketching, although it might cut out new people from entering the conversation?</p>
<h1 id="conclusion">Conclusion</h1>
<p>In an attempt to <em>sketch</em> more in the blog medium, I just whipped up this blog post in a couple hours, so I don’t really have a grand conclusion. And yet I’m still hitting publish!</p>
<p>I’m curious to think more about the constraints/freedoms afforded by different kinds of creative tools, and whether we could get more clever with those constraints to enable new kinds of sketching. I’m especially curious about kinds of sketching which are only possible thanks to computers, and couldn’t have been done with paper and pen. Paper and pen are great tools, but what else is out there?</p>
<p>I guess I’ll try to keep sketching more freely on the blog about ideas like this. Click one of the subscribe links below if you’d like to see my future writing on this sort of thing. And if you have thoughts to share, <a href="mailto:gklitt@gmail.com">send me an email</a> or <a href="https://twitter.com/geoffreylitt">a tweet</a>!</p>
<h1 id="related-reading">Related reading</h1>
<p>If you enjoyed this post, here are some people who have written/spoken about related topics in much greater depth and eloquence than I have here:</p>
<ul>
<li><a href="https://vimeo.com/66085662">Drawing dynamic visualizations</a>, by Bret Victor</li>
<li><a href="https://numinous.productions/ttft/">How can we develop transformative tools for thought?</a>, by Andy Matuschak and Michael Nielsen</li>
<li><a href="https://www.inkandswitch.com/capstone-manuscript.html">Capstone, a tablet for thinking</a> by Ink & Switch</li>
</ul>
OCaml vs Ruby for implementing a programming language/2020/08/28/ocaml-vs-ruby-for-implementing-programming-languages.html2020-08-28T18:46:00+00:002020-08-28T18:46:00+00:00Geoffrey Litt<ul>
<li>Languages have their strong applications. OCaml is known for being awesome to build PLs. but why? What’s the fuss? I’m going to show you via concrete code examples.
<ul>
<li>Original Rust compiler</li>
<li>Full-stack tools: Hazel, Dark</li>
</ul>
</li>
<li>I’ve built the MAL lisp in...</li>
</ul><ul>
<li>Languages have their strong applications. OCaml is known for being awesome to build PLs. but why? What’s the fuss? I’m going to show you via concrete code examples.
<ul>
<li>Original Rust compiler</li>
<li>Full-stack tools: Hazel, Dark</li>
</ul></li>
<li>I’ve built the MAL lisp in Ruby (see the TCO post) and now doing it in OCaml.
<ul>
<li>Like the TodoMVC of language implementations</li>
</ul></li>
<li>Ruby isn’t bad but it’s no OCaml for this purpose.
<ul>
<li>dynamic OO(ish) lang with lots of mutation</li>
<li>vs. static functional lang with mostly immutable.</li>
</ul></li>
</ul>
<p>Little notes:</p>
<ul>
<li>Link to Jon Edwards post?</li>
<li>Note ReasonML is a potential for full stack envs too.</li>
<li>Note the “Sweet spot”, Matt Might</li>
</ul>
<h1 id="why-its-great">Why it’s great</h1>
<h2 id="static-typing">Static typing</h2>
<ul>
<li>so many bugs fixed</li>
<li>tagged unions</li>
<li>ad hoc nil checks in ruby are covered in ocaml</li>
<li>no annotations needed due to inference
<ul>
<li>maybe we’ll cover how this works sometime soon. OCaml is a great lang for implementing type inference.</li>
</ul></li>
<li>can be a little annoying sometimes that it’s hard to get the program to run
<ul>
<li>I wish there were “type warnings” as Yegge said in his compilers essay, or Typescript has.</li>
<li>tricks:</li>
<li>fill in bogus code that passes type checking</li>
<li>adding type annotations to guide type inference</li>
</ul></li>
</ul>
<h2 id="pattern-matching">Pattern matching</h2>
<ul>
<li>tagged unions + destructuring, amazing</li>
<li>show some examples of where pattern matching rules</li>
<li>exhaustiveness checking catches weird cases</li>
</ul>
<h2 id="immutable-mostly">Immutable (mostly)</h2>
<ul>
<li>contrast environment mutation</li>
<li>note how ref makes it easy to still manage limited mutable state</li>
<li>reasoning about env state in ruby isn’t as easy</li>
</ul>
<h1 id="why-its-not">Why it’s not</h1>
<ul>
<li>ugh, tooling (link to Paul Biggar’s post)</li>
<li>Makefile, library linking, etc</li>
<li>some exciting news on this front from the OCaml platform keynote</li>
</ul>