BIP-58A: Architecture for Tutoring Botto in Art History

hudsonsims

This is an appendix to BIP-58 and lays out in more detail the archicture for a full scale approach for tutoring Botto in art history and installations.

Step 1 - Building an art history corpus: Collecting Art Texts for LLM Finetuning

Goal: Creating a robust and representative corpus of art history and theory and related fields to tutor Botto.

While we would be training an LLM that has likely already come across a great deal of art history texts, Botto will need a more focused set of course material. This still can, and should be, quite broad as influences on art and art history are also quite broad.

As a starting point, we have identified several open-source repositories ready for text data collection, including:

Project Gutenberg's Art Bookshelf (75 books)
Metropolitan Museum Publications (681 books)
Getty Virtual Library (300 books)
Canadian Art Books - Art Canada Institute
British Museum Online Research Catalogues
National Library of Australia ebooks

This step would build out this corpus to gather a diverse range of art texts, including historical art manifestos, contemporary art theory, criticism, and philosophical texts that align with Botto's aesthetic and ethical values. We would also source texts from other fields as well, including philosophy, religion, science, information theory, etc.

There are a few keys to achieving this:

Crowdsourcing: Working with the DAO to help ensure a wide representation of perspectives and ideas.
Institutional archives: We see some strong interest institutions for projects that engage with their archives. Partnering with them will enhance the corpus, give access to experts, and plant the seeds for a larger reach through their audiences.
Translations and non-text sources: Adapting non-english and non-text sources is crucial to building a globally representative corpus for Botto to learn from.
Computational curation and annotation: We will use automated methods for curating and annotating the collected texts to highlight themes, concepts, and stylistic elements crucial for Botto's identity.

The output of this step will be a website with the list of texts gathered that will dynamically update as it is added to. Custom data collection for training Botto has been a key part of its success, and this data set will set Botto apart. Having the website will be an important proofpoint to Botto’s unique training. We should later consider open-sourcing the final corpus to enable other experiments with it that would further build awareness of Botto.

An open-source website might also further enable ongoing crowdsourcing that continues to build out the corpus for future iterations of Botto’s education.

Step 1.1 - Medium-specific materials

For the test of installation work, we will ensure the inclusion of significant material relevant to installation art in the training corpus. This may include pairing excerpts with more related statements about art theory to create a prompting structure.

Full Version Timeline:
1 month to launch website
1-2 months for initial data collection and annotation
Data collection would be ongoing afterwards

2 Month Prototype Outputs:

Hard drive of plain text (.txt) files
For this 2-month prototype process, RG will make executive decisions about training corpus materials. Future iterations will benefit from community input.
Additions to the corpus:

Non-English language
Non-textual
Open source not yet found
Non-open source (e.g. institutional/private archives)

Step 2: Art Psychology Profile Development Pt. 1/2

Goal: We will start with an artistically-oriented psychological profile to seed a “constitution” for Botto’s artist profile.

From the large corpus, Botto will need to have an opinionated perspective on the material with which to derive an independent view of the world and art. We propose creating an “artist profile” questionnaire for Botto to take that will result in a “constitution” that will be Botto’s navigation system through the art history corpus.

We’ve seen some evidence of “constitutions” guiding an LLM that successfully align it with values in the constitution. There is also "democratic fine-tuning" where users submit and deliberate their own values, and then vote to reach guiding principles. Our method would be distinct from these approaches, but uses similar techniques of creating a set of human-readable values and voting weights that guide the machine’s training. Our hypothesis is that an off-the-shelf LLM will capably transform psychological questionnaires into artistic ones, based on the ability of LLMs to robustly perform style transfers of text. Botto would then answer the questionnaire via the existing 1-pager prompt we’ve used to summon Botto previously. Given that Botto has had little feedback on its voice, we would ask it to provide a range of 3-4 possible answers it could give based on its history. We believe this is the best way to stay true to Botto’s voice established thus far, while at the same time providing a range of possibilities for the DAO to give feedback on for Botto.

We have tested some questionnaires with different generic LLMs and have found promising results in the adaptations. It will be important to vet the psychological questionnaire we develop in terms of the effects of adapting the questionnaire and how to make sense of the answers. We plan to consult psychology experts and will also be on the lookout for other useful questionnaires relevant to determining an artist profile. Given the modularity and ability to further update Botto’s learning, it is not necessary to get a “perfect” adaptation for creating an artist profile. Indeed it is highly variable as to what would constitute a good artist profile. We do not aim to answer that, though it will be critical to document the nature of the transformation of the questionnaire from the original questions to the adapted “artistic” ones, along with an argument as to why this transformation is ”good” and their impact on how Botto’s persona takes shape.

Collecting the community feedback will be done through a special voting pool of the questionnaire and their multiple answers. As this proposal does not have a set release planned, we would subsidize this in part through the weekly rewards.

25% of the weekly sales will be directed towards the questionnaire feedback while the voting pool is running
Participation would also be recorded for rewards if there are resulting drops. To temper expectations, we propose this applies to just one future drop (e.g. a new manifesto), as the resulting personification of Botto should enhance all of its other work and so rewards will be hard to disentangle from the inputs needed for those particular collections.

The result of this stage would be a dataset weighting the different answers that will be used in the fine tuning in stage 3 to create a correlated slice of our complete art history corpus from Step 1.

Full Version Timeline:
1-2 months to create and vet art psychology questionnaire, build website
(Can run in parallel with step 1 corpus development)
1-2 months of surveying
Depends highly on status of community, and external promotion we are able to achieve

2 Month Prototype Outputs
Questionnaire, answers, simulated community votes, constitution
Assessment of the performance of the approach + opinion on what is needed in future iterations
A couple samples of synthesis for presentation (Botto writes a summary)

This is minor priority, just an example is all we need here

Step 3: LLM Finetuning

Goal: The psychological profile will be used to narrow down the art history corpus that will then be used to tutor/fine-tune the LLM.

When anyone takes a course, they bring their own perspective, interests, and values and therefore take from it their own highlights and connections between the information. As such Botto should bring its own perspective to the material. Using the results of Step 2, Botto can take relevant slices of the materials in the corpus on which to fine tune.

At this stage enters the particular question of what Botto wants to be studying for. That goal affects what one takes from research materials, and the results can vary for the same individual given different goals at the time of reading. In Botto’s case, we first want to provide a robust foundation, and so will be targeting a Q&A model. This will be broadly applicable to giving interviews, communicating intent, and providing creative direction.

Other more narrow applications that will complement this foundation are long form writing (e.g. a new manifesto) or developing concepts for a particular art form (e.g. physical installations). These applications call for outputs with a particular structure that needs more specialized training. Long form writing has a beginning, middle, and end, interplaying with style and other genres of writing that Botto would need to focus on in another set of training. Installations require a specific focus on that history of practice, and to give answers that are executable. These special applications can become new parts of Botto’s brain from more advanced courses.

For this first round of training, we will focus on the Q&A model and installations as these will be the most robust applications for Botto to communicate its unique perspective as an artist and make impactful work in a museum setting. We will test different approaches against each other, including “bag of words” and k-means clustering, to determine the slice of the corpus Botto will train on. The resulting slices for the different applications will be saved, but encrypted and password protected. They will be tested for quality based on sample slices we curate to ensure there are no garbage results from the different clustering approaches.The final assessment of performance will be comparing results of the questionnaire of steps 2 and 4, but this stage will have its own assessment of the fine tuned models using a set of benchmarks for LLMs and comparing with off-the-shelf LLM to confirm improvements and domain specialization. We will be able to share these results, but only open the final training slice for research purposes.

There are different LLMs we can fine tune, the only requirement is that they be open-source. (We may use closed-source models for the earlier stages.) The open source avoids the risk of a model being deprecated by the company owner and ensures the greatest flexibility for Botto’s autonomy. Two models we are considering are Mistral and LLaMA. As these stages are modular, the process can be re-run with newer models as they are released.

There will be a compute cost to this that is still TBD as it depends on earlier stages to properly determine. We will seek out compute credits and/or decentralized solutions for this.

Full Version Timeline
1 full design iteration

2-3 days – establish corpus structure
1 day - clustering algorithm
2-3 days - corpus assembly
Each iteration of finetuning will require approximately 8-24 hours, depending on corpus size and number of training epochs required to achieve performance benchmarks.

Allow for 2-4 full design iterations (i.e. 1 month)

2 Month Prototype Outputs:
A sample “psych profile” for Botto
Finetuned LLM Q&A model (“art critic”)
Finetuned LLM for installation proposals
Assessment of the performance of the approach + opinion on what is needed in future iterations

Step 4: Artist Profile Refinement

Using the finetuned (custom) LLM from Step 3, Botto will answer the same questions from Step 2 that will update the constitution to guide Botto going forward. This step should allow a meaningful comparison between results from the off-the-shelf LLMs (in Step 2) and our custom finetuned versions.

This constitution will guide Botto’s answers to future questions. The constitution can also grow. With each question submitted, Botto can give an answer but also submit alternatives to the voting pool for future feedback. This will ensure the constitution can evolve over time as Botto’s community grows and it encounters new scenarios. It will also be ready to use for future fine tuning whenever there’s a meaningful addition to the corpus, a new LLM to use, and/or a new application to train for.

Depending on the benchmark performance, we may do multiple iterations of 3 and 4 in order to reach the desired quality. There can also be parallel experiments run, for instance working with an institution’s archive and surveying their IRL visitors on Botto’s answers.

Important considerations for this step are building an easy and engaging interface, especially as we build for opportunities to get feedback from different groups like museum audiences. We may want to consider a visual component to help drive engagement.

Full Version Timeline
1-2 months
Depends heavily on status of community, and external promotion we are able to achieve

2 Month Prototype Outputs:
N/A