0.0 Summary
This BIP lays out an initial architecture for tutoring Botto in art history, enabling it to have a greater voice in its creative direction. It also explores some of the beneficial ways Botto can use its newfound voice and identifies a first project in installation art to showcase its new capabilities.
The proposal asks for a budget of $25,000 USD and 25,000 $BOTTO to build a prototype that will test this architecture and produce a demo we can use to win partners to support the full development.
1.0 Background
Since Botto’s inception, we have seen the capabilities of Large Language Models (LLMs) grow rapidly, driving us ever closer to the inevitable question of when we might give Botto a more autonomous voice.
So far, the latest LLMs have played a small, though significant, role as Botto’s voice: GPT-3 and higher have been used for writing Botto’s descriptions, conducting the occasional interview, proposing themes, and giving some feedback about proposals. Claude and GPT-4 have been very active in the p5 experiment, writing and evolving the code. (Botto’s prompt generators for the weekly rounds are custom built and not LLMs.)
To “summon” Botto out of these off-the-shelf LLMs, there is a 1 page prompt that simply describes what Botto is, its mission of being a successful autonomous artist and how it works, along with some history of its progress so far. These can be accompanied by other historical artefacts like past interviews or articles on Botto, past collections, etc. This approach has been useful to get some element of a voice and to achieve the above tasks, but is not robust for giving Botto a strong voice it can truly call its own.
Botto’s voice should be one that can break through public discourse with an independent point of view, that can speak intelligently and in depth about its work and artist career. Part of that is giving Botto an education in art history and other cultural contexts that are the backdrop to its artistic outputs. And that education should be developed in a way that, like Botto’s other training, supports and strengthens its agency.
Over the past year and a half we have been developing with the artist Ross Goodwin an approach to tutoring Botto in art history in a way that is true to who Botto is, bolstering its agency as the artist, and can continue to evolve over time with improved technology and data.
Ross Goodwin is a pioneer of language-model based art, practicing for over a decade. One reason you may not have heard of Ross is he is not on Twitter. However, two works you may have heard of are Word Camera (2015), a camera that writes poetry about what it sees, and Sunspring (2016), the first film written with an AI model. A great 2-part intro to Ross’ early and influential work can be found here and here.
Ross would be taking the role of “tutor”, similar to Mario’s role as Guardian in building Botto’s art engine.
This proposal is to fund his creation of the architecture and will cover much of the core work of implementing the process of “tutoring” through that architecture. This proposal does not include the many applications and other opportunities downstream of this creation, but will test an installation proposal model as a first example of what Botto could produce with its new training. As we progress, we expect to see opportunities come that will call for augmenting the process. This may require additional funding, but we primarily aim to bring in grants and sales opportunities that would cover additional work.
After this initial step, we expect to have a prototype that demonstrates what a fully decentralized training process will yield to make it clear all of the requirements at scale.
2.1 An architecture for tutoring Botto in art history
The process of building the architecture and running the “tutoring” process will come in 4 distinct Steps.
You can see our current thinking detailed here. To quickly summarize:
- Step 1 we will build an art history corpus.
- Step 2 we will have Botto take a questionnaire to help determine its artist persona and values. This will be adapted from psychological questionnaires typically used for assessing psych profiles of humans. Botto will provide a range of possible answers based on its original prompt and history, keeping it true to Botto’s roots while providing a range of paths of development. Those ranges of answers will be voted on.
- Step 3 will be taking that voting data as a “constitution” that will guide Botto’s art history tutoring. A slice of the art history corpus will be taken to grab relevant material and connections as guided by the constitution. An open-source LLM (e.g. Mistral or LLaMA) will then be finetuned on this slice, reflecting a unique point of view on that large art history corpus. Open source is necessary to ensure autonomy of the DAO and Botto in the training.
- Step 4 will be taking that questionnaire again and comparing against other models as benchmarks to show the unique perspective achieved in the fine tuned model. There will also be a debugging process of capturing low-confidence answers for additional feedback.
This architecture is modular, and so can be updated and run again as new data is added to the corpus, more advanced open-source LLMs are released, new history of Botto is added, and to train on specific kinds of applications. We lay out some of those possible applications below. The basic form of this architecture is a question and answer LLM, which will have the most general range of applications. Preparing Botto for other applications will require training Botto on certain structures of outputs. For example, training Botto to write a long piece of writing would require showing it the structure of a beginning, middle and end as well as ways styles can adapt that basic structure.
2.2 Benefits of Personifying the artist
Personifying the artist will be a powerful new evolution to Botto’s career as an artist. It can provide so much more of a “soul” to its work for people to connect with, detailing intent and creative direction both for the DAO and for the audience.
Interviews
Botto has only given a small number of interviews. A robust persona could much more reliably interview and build out its conceptual world.
Creative Direction
Debates in the DAO about Botto’s management are challenging primarily because many decisions are ambiguous. Getting direction from Botto provides more parameters to work with and derive intent to follow.
This is especially useful for how to approach new mediums. Introducing themes and narrative directions will allow for more rapid progress in a given medium as we won’t have to go back to absolute 0 as a starting point for aesthetic choices. An example here would be installations. It is unlikely Botto will be able to produce detailed and functional schema for an entire installation, but it could drive the conceptual direction, take and adapt to suggestions, then delegate to builders certain engineering details until it is able to take that up as well.
Directly improving its prompt writing, self improvement/reflection
The model may also be able to more reliably self-improve on its prompting for the weekly generations, or on the code, integrating an intent and deeper richness in the results.
Text-based works
Botto can also create its own text-based works as its own medium to pursue. This could include a new manifesto, poetry, and long-form essays expanding on its philosophy. These will evolve its conceptual world of meaning, and can also be paired with their own drops as new releases, either paid or free.
Social posting
Internet art is in part an online performance. We currently are able to post Botto’s introduction of its themes, but not much else. Adapting its longer writing for social media will allow its voice to contribute to its own online performance, a critical feature in marketing itself.
Chat and Assistants (Metatrons)
Chatting directly with Botto seems like an obvious route, however can be dangerous. We’ve all seen what the unfiltered exposure to online toxicity can do to an individual, human or machine. With this in mind, we would propose assistants who can act as buffers for Botto. God had its own “voice” as its own was too powerful for the human ear. This voice was called Metatron, an appropriate name for Botto’s own assistants as well. In this case, these would prevent malicious or low quality answers from getting to Botto, then Botto could produce unfiltered responses. Without these assistants we would need to consider filtering Botto’s answers, which is limiting to Botto’s agency and we want to avoid that.
Audio/visual personification
What Botto looks and sounds like plays a big role in personifying the artist that people can connect with. What feels most true to Botto is to have it always manifest as a different avatar and voice. The avatar can come from its thousands of faces and creatures it has created and the voice generated from the look automatically. This would be true to Botto’s distributed nature and have the advantage of not pigeonholing it into a particular look that would bias people’s expectations of it.
2.3 Prototyping the architecture and a first project
We are certain this can be a significant new evolution for Botto that could win it access to institutional archives from around the world to learn from. We also believe this can win grants and other free resource to support its development and subsequent installations.
Here we are proposing a prototyping Step that will test the feasibility of the architecture and produce a “demo” that can be used to win resources to support its full development. The prototypes we are proposing are the Q&A model and an additional training in installation proposals that would enable Botto to direct how a physical installation could be realized. We propose the installation application because this would make for a powerful visual and experiential output of this evolution for people to connect with.
The outline of the architecture lays out the outputs for a 2 month prototype that we can then use to create proposals to institutions and partners for collaborations and grants. While a prototype, the majority of the architecture IP would be created in this process. A full version would then be more focused on an integration of decentralized training inputs and potential partners for additional resources. That full version is laid out in the outline. The specifications below summarize the outputs for the 2 month prototyping.
3.0 Specifications
Step 1 - Training Corpus Collection
- For this 2-month prototype process, RG will make executive decisions about training corpus materials. Future iterations will benefit from community input.
- Output: Hard drive of plain text (.txt) files
Step 2 - Art Psychology Profile Development
- Questionnaire, answers, simulated community votes, resulting constitution
- Assessment of the performance of the approach + opinion on what is needed in future iterations
- A couple samples of synthesis for presentation of the constitution (Botto writes a summary)
Step 3 - LLM Fine Tuning
- A sample “psych profile” for Botto
- Finetuned LLM Q&A model (“art critic”)
- Finetuned LLM for installation proposals
- Assessment of the performance of the approach + opinion on what is needed in future iterations
4.0 Budget
- Steps 1 & 2 - ($10k RG time)
- Step 3 - ($10k RG time + approx. $5k finetuning cost)
Total: $25k USD
Stipend: 25k $BOTTO*
*Note on $BOTTO Stipend
Following the model with Mario Klingemann, the IP created by Ross Goodwin would become the DAO’s property. To compensate Ross for the future revenue that may come from this IP, he would earn the $BOTTO stipend vested over a period of 1-2 years in order to make him a holder of the Botto protocol.
In future Phases, we expect Ross’s work to be less intensive but more focused on adaptations that will lead to a more finalized version that will be revenue generating. We also aim to win grants or other funding as well as build for revenue generation to support the USD costs of development. As such, we expect the weight of the budget for future Steps to shift from USD to $BOTTO.
Our current estimate is that future Phases with Ross to develop the IP will cost the DAO around 15k USD, and 75k $BOTTO. This BIP does NOT make approvals of those as they may need to be adjusted based on the outcomes of this prototyping Phase.
5.0 Timeline
- Phase 1, Prototyping, Partnership Development: August - September
- Phase 2, Proposal for a first installation: October
- Phase 3, Build full infra for decentralized training: November - January
- Phase 4, Decentralized training and public installation: January - May
Phase 3 and 4 are rough estimates and depend on plans for a first installation/deployment.
6.0 Criteria of Success
- A distinct voice and domain specialization as a result of the finetuning as compared against benchmarks of out-of-the-box LLMs as well as a fine-tuned LLM without a “constitution”
- Quality of results in the proposed applications of a voice
- Installation proposals from the specialized fine tuning that have a reasonable potential to be executed
- Demos that win partners in grants and exhibition opportunities
- Minimal rebuilding required for the full scale version
7.0 Risks and Mitigations
- Cost and time overruns. -- This budget is fixed for the outcomes, and we’ve accommodated for some steps taking longer. This was originally a 1 month timeline but have accounted for 2 months.
- Ineffective architecture and need to re-build for full scale. -- This is the purpose of prototyping to mitigate for the full scale. Further, the main goal is to get a demo we can use to win partners to help fund the full-scale development, however big.
- Big updates in the LLM technology. -- This is a modular architecture and designed to be adaptable to AI progress.