PKU A-LAB · AI for Social Science Series

AI Copilot in Social Science Research:
Simulation, Verification, and Interaction

Le Bao

Massive Data Institute, Georgetown University

May 24, 2024

Generative AI

  • Artificial Intelligence (AI), particularly Generative AI, is rapidly penetrating every aspect of the world, including social science research.
  • Generative AI: Deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.
    • Text: ChatGPT, Gemini, LLaMA, Claude, Bloom, Jasper,….
    • Image: DALL-E, Midjourney, Stable Diffusion, Imagen, Parti, ….
    • Speech: Vertex AI, ElevenLabs, Murf AI, Resemble AI, ….
    • Video: Sora, Synthesia, Make-A-Video, Phenaki, Amazon Rekognition, ….
    • Data: SMOTE, ADASYN, Augmenter, ….
  • AI tools and AI-generated data are increasingly being integrated into social science research.

GPT in Political Science

Overview

  • Questions:
    • How generative AIs can assist and extend social science research?
    • How we can properly use generative AIs for social science research?
  • Today
    • Fundementals of Generative AI
    • Simulation
    • Verification
    • Interaction
  • Disclosure

How Does AI Become So Good?

  • It didn’t happen overnight.
  • There are three major breakthroughs worth highlighting:
    • Neural network
    • Transformer
    • Training and tuning

Neural Network

  • The Statistical Paradigm Ante 1990
    • Pick a functional form based on domain and nature of the data.
    • Find parameters that (best) fit the data.
    • Gain understanding from examining those parameters

Neural Network

  • Neural networks are flexible predictors
    • Neural networks are highly parameterized functions.
    • We can get any other function we want by setting the neural net parameters the right way.
    • Rather, we put data into the neural net, compare the function it represents with the one the data implicitly represents, and update the parameters.
    • Neural networks are built hierarchically, and plug together like Lego blocks.
    • In particular, neural nets are built compositionally.
      • Depth = levels of compositionality.

Neural Network

  • E.g., the most advanced GPT-3 model was trained in a model with 170B parameters and 96 layers.

Neural Network

  • Neural Netowrok Playground (https://playground.tensorflow.org/)

Neural Network

  • Noncovex optimization
    • Nonlinear loss function in high-dimensional parameter space
    • Gradient descent
    • Automatic differentiation

Transformer

  • A type of neural network architecture designed to handle sequential data efficiently, particularly in natural language processing (NLP).
  • “Attention is All You Need” by Vaswani et al. (2017)
    • Self-attention allows the model to weigh the importance of different words in a sentence: \text{Attention}(\underbrace{Q}_{\text{queries}}, \underbrace{K}_{\text{keys}}, \underbrace{V}_{\text{values}}) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
      • E.g., “The cat sat on the mat because it was tired.”
    • Multiple attention heads run in parallel to capture different aspects of the relationships between words.

Training and Tuning

  • Training
    • Generative models such as LLM are trained on massive datasets.
    • E.g. GPT-3 was trained on a data the Common Crawl (410B), WebText2 (19B), Books1 (12B), Books2 (55B), and Wikipedia (3B).
    • Unsupervied learning without explicit labels, typically predicting the next word in a sentence (language modeling) or reconstructing corrupted input (masked language modeling).
    • This process is now often referred as pretraining.

Training and Tuning

  • Fine-tuning
    • Fine-tuning involves taking a pretrained model and further training it on a smaller, task-specific dataset.
    • Transfer learning: use the pretrained model as a base, retaining its learned knowledge. Train on a new dataset with supervised learning to adjust the model’s weights.
    • Fine-tune a pretrained language model (e.g. Llama) on domain-specific (e.g. political, medical, legal, etc.) texts to improve its performance in generating domain-specicif texts.

Training and Tuning

  • Reinforcement learning
    • An agent learn an optimal or nearly-optimal policy that maximizes the “reward function.”
    • ChatGPT uses Reinforcement Learning from Human Feedback (RLHF), which uses human feedback in the training loop to improve performance—mimic human preferences.

LLM In a Nutshell

  • LLM is designed and trained to generate text based on patterns and associations learned from vast datasets.
  • Essentially, LLMs generate text by predicting the next word in a sequence based on the context of the preceding words.
    • E.g. Input: “The cat sat on the;” Output: “mat because it was tired.”
  • Advantage
    • Contextual understanding
  • Challenges:
    • Incorrect information
    • Context limitation
    • Computational resources
    • Copyrights and ethical concerns

A Big Question

  • Is (generative) artificial intellegence really intellegent?
  • The debate:
    • A “stochastic parrot”
    • “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

                              Geoffrey Hinton

“… AI language models aren’t just predicting the next symbol, they’re actually reasoning and understanding in the same way we are, and they’ll continue improving as they get bigger…”

AI for Simulation

  • Goal: To use AI for simulating human-like behavior and complex systems.
  • Generative content: the model is trained on extensive real-world data.
  • Contextual understanding: it knows context better due to the features of models and large training set.
  • Complexity: it can captures and simulates a wide range of human behaviors and complex scenarios
  • Domain transferability: It can transfers knowledge from one domain to another based on complex associations.
  • Applications: social behavior, political orientations, economic activities, public health, and more.

Public Opinion and Political Attitudes

  • Argyle, Lisa P. et al. 2023. “Out of One, Many: Using Language Models to Simulate Human Samples.” Political Analysis.31(3): 337–351.
  • Comparing GPT-3 generated silicon sample with human survey responses.
  • Generating synthetic prompts by constructing a first-person backstory based human survey profiles

Example Prompt

Ideologically, I describe myself as liberal. Politically, I am a strong Republican. Racially, I am white. I am male. Financially, I am upper-class. In terms of my age, I am young. When I am asked to write down four words that typically describe people who support the Demogratic Party. I respond with:

Public Opinion and Political Attitudes

  • The similarityy between silicon and human samples is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes

Public Opinion and Political Attitudes

  • Bisbee, James, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, et al. 2024. “Synthetic Replacements for Human Survey Data? The Perils of Large Language Models.” Political Analysis. 1–16.
    • The average scores generated by ChatGPT correspond closely to the averages in our baseline survey.
    • Less variation and more extreme attitudes in responses than in the real surveys.
  • Santurkar, Shibani et al. 2023. “Whose Opinions Do Language Models Reflect?” In Proceedings of the 40th International Conference on Machine Learning.
    • GPT generated sample represent some demographic groups better than others.

Other Applications

  • Economic activities and user behavior (Horton, 2023; Weiss, 2024; Wang et al. 2024; Li, Zhang, & Sun, 2023)
  • Epidemic disease (Ross et al., 2023)
  • International conflict (Hua et al. 2024)
  • Social network (Gao et al., 2023; Törnberg et al., 2023)
    • Törnberg et al. use ANES data to simuate a social media environment.
  • Reasoning & decision making (Cheng and Chin, 2024; Webb, Holyoak, & Lu, 2023; Wang, Chiu, & Chiu, 2023)

Reflections

  • Bias propagation: Simulations can perpetuate and amplify biases and sterotypes present in the training data.
  • Understanding vs. mimicking: Does the AI-generated content reflects human thinking and understanding?
  • Overfitting & repetitiveness: AI tend to generate repetitive or circular content, lacking diversity and innovation in responses.
  • Static nature: Can AI predict unseen scenarios that are not covered in the training data?
  • Continous learning: Do we want AI to learn from the prompts/data we feed it?
    • Zero-shot vs. few-shot vs. fine tuning

Bias in Generative AI

  • AI has its bias inherent from the training data (and also possible algorithm design and user interactions)

Bias in Generative AI

  • AI has its bias inherent from the training data (and also possible algorithm design and user interactions)

Bias in the Context of Simulation

  • AI has valules, personality, etc. (Yao et al., 2023; Miotto, Rossberg, & Kleinberg 2022)

GPT is WEIRD

  • Atari et al. (2023) shows that GPT performs cognitive psychological tasks most resembles that of people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies.

Bias in the Context of Simulation

  • To use AI for simulation tasks, we want it to reveal existing differences but not to amplify the bias.

AI for Verification

  • AI-assisted verification:
    • Using AI tools to verify data, models, and processes in various domains.
    • AI automates repetitive and time-consuming verification tasks, increasing efficiency and scalability.
    • AI models can reach high-precision on par with conventional approaches.
    • Example: Medical diagnoses, fraud detection, etc.
  • Verification of AI-generated content
    • Ensuring the accuracy, reliability, and ethical compliance of content generated by AI models.
    • To verify the information produced by AI is factually accurate.
    • To verify the content generated by AI is reliable and consistent.

AI-Assisted Verification

  • Use AI to process text document

Issues of AI-Assisted Verification

  • Black-box models and difficult in assessing the processes
  • Fine-tuning for specific domain of data
    • Generalibility
  • Verifying large amount of materials can be costly.
    • Economical fine-tuning models: e.g. Low-Rank Adaptation (LoRA) (Hu et al. 2021)

Verification of AI-Generated Content

  • AI can be wrong: AI-generated content is not always accurate or reliable.
    • If we want to use AI-generated information, we need to verify the accuracy and reliability.

Example: Verifying AI-Generated Measurement

  • Measuring ideologies of legislators
    • DW-NOMINATE : legislators vote together are closer.

Example: Verifying AI-Generated Measurement

  • Problem:

  • Alternatives: Perceived ideology (Hopkins and Noel, 2022), campaign finance scores (Bonica, 2016).

Example: Verifying AI-Generated Measurement

  • What about using texts (and context)
  • Wu et al. (2023) use LLM scale the ideologies of politicians
    • Asking ChatGPT to place politicians.

Prompt

Which senator is more liberal/conservative: (Senator1) or (Senator2)?

  • What if AI is wrong?

Verifying AI-Generated Measurement

  • Monte Carlo method
    • Run the model multiple times with the same or slightly varied inputs.
    • Analyze the distribution of the outputs to measure uncertainty.
  • Bootstrap method
    • Repeatedly sample from the dataset with replacement to create multiple new datasets.
  • Prior iniformation
    • To improve the accuracy of the output performance
  • Caveats:
    • If the AI model is highly uncertain, it can have different implications.

AI for Interactions

  • Many social science phenomena involve interactions but we rarely study interactions directly.
  • Human-human interactions
    • Use AI to analyze conversation, social networks, and group dynamics.
    • Use AI to create interactive environment.

AI for Interactions

  • Simulate and analyze human-human interactions using LLM
  • Studying diverse behaviors and dynamics in controlled environments
    • Bahaviorial dynamics (Mathur et al., 2024), collabaration (Li, Zhang, & Sun, 2023), social norms (Liu et al. 2023), etc.

AI for Interactions

  • Experimental research on human interactions is becoming increasingly popular but it is costly and limited.
    • Santoro, Erik, and David E. Broockman. 2022. “The promise and pitfalls of cross-partisan conversations for reducing affective polarization: Evidence from randomized experiments.” Science Advances. 8(25).
    • Combs, Aidan et al. 2023. “Reducing political polarization in the United States with a mobile chat platform.” Nature Human Behaviour. 7(9): 1454–1461.

AI for Interactions

  • AI-powered experiment research
    • AI can replace human participants on one side of an interaction to simulate and study social behaviors.
  • Benifits:
    • Reduces costs and logistical complexities.
    • Enables controlled and consistent interactions.
    • Scale the experiment to include larger samples and test more scenarios.
    • Combines with other developments in experimental methods: e.g., adaptive design.
      • Rosenzweig, Leah R., and Molly Offer-Westort. 2022. “Conversations with a concern-addressing chatbot increase COVID-19 vaccination intentions among social media users in Kenya and Nigeria.”
    • AI can generate different types of content: text, image, audio, video, etc.

AI for Interactions

  • Argyle, Lisa P. et al. 2023. “Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale.” PNAS. 120(41).
    • An AI chat assistant providing real-time, evidence-based suggestions for messages during divisive online political conversations.
    • The AI chat assistant successfully moderate the only chat in terms of improving participants’ perceived quality of conversation.
    • Participants’ policy positions remained unchanged by the AI intervention, indicating no manipulation.

Conclusion

  • AI tools for social science is still in a rudimenatary stage.
    • Most works involve fine-tuning AI models or testing AI performance in social science contexts.
  • Current use of AI should primarily be as a copilot, assisting researchers on well-designed tasks.
    • This requires human oversight on AI tasks.
    • It’s critical to constantly verify AI-generated content.

“Painting Didn’t Die”

  • Upon seeing the first daguerreotype around 1840, the French painter Paul Delaroche (1797-1856), declared: “From this moment, painting is dead.”
  • This led to a cultural and moral panic about the future of labor, the industry of realistic painting, and the fate of art and human creativity itself.
  • Painting did not die that day. What followed was, in fact, the most shining era of the arts: the birth of abstract expressionism, impressionism, cubism and modern art in the 1860s.
  • Photography also became a form of art later.
  • So, the driving force behind the evolution of methods and its resilience against technological advancements is creativity.