PKU A-LAB · AI for Social Science Series

Generative AI

Artificial Intelligence (AI), particularly Generative AI, is rapidly penetrating every aspect of the world, including social science research.
Generative AI: Deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.
- Text: ChatGPT, Gemini, LLaMA, Claude, Bloom, Jasper,….
- Image: DALL-E, Midjourney, Stable Diffusion, Imagen, Parti, ….
- Speech: Vertex AI, ElevenLabs, Murf AI, Resemble AI, ….
- Video: Sora, Synthesia, Make-A-Video, Phenaki, Amazon Rekognition, ….
- Data: SMOTE, ADASYN, Augmenter, ….
AI tools and AI-generated data are increasingly being integrated into social science research.

GPT in Political Science

就我们political science来说，这应该是最早的一篇，发在主流政治学顶刊上的，当然因为artificial intellegence不是什么新概念啦，之前肯定有一些讨论人工智能与政治、社会的文章，但是融入新的AI技术的，这应该是第一篇。可以给大家简单分享下像这种大热点应该怎么蹭，但是从作者的角度应该也不算蹭热点了，当时文章出来的时候Lisa跟我们讲，他们是GPT-3一出来几个月就开始做了，当时还没有chatgpt，也没多少人关注，然后文章也是历经坎坷，因为reviewer都看这什么呀，还能这样，然后到发出来正好是ChatGPT刚出来。所以呢，蹭热点得早得快，到现在其实都根本蹭不上热点了。庞老师可能也知道这个，我听Political Analysis的主编，就是做这一系列第一个讲座的Jeff Gill说，现在Political Analysis收到一半以上的稿件都是AI有关的，但这其中绝大部分都是被desk reject，就是直接拒绝了，也不发出去review，因为没什么新意，你就是用了AI，也不能说明你方法上有什么创新，而且很多时候还可能用的不好。所以，其实今天，主要就是跟大家一起交流一些AI在社会科学中的应用。

Overview

Questions:
- How generative AIs can assist and extend social science research?
- How we can properly use generative AIs for social science research?
Today
- Fundementals of Generative AI
- Simulation
- Verification
- Interaction
Disclosure

How Does AI Become So Good?

It didn’t happen overnight.
There are three major breakthroughs worth highlighting:
- Neural network
- Transformer
- Training and tuning

Neural Network

The Statistical Paradigm Ante 1990
- Pick a functional form based on domain and nature of the data.
- Find parameters that (best) fit the data.
- Gain understanding from examining those parameters

Neural Network

Neural networks are flexible predictors
- Neural networks are highly parameterized functions.
- We can get any other function we want by setting the neural net parameters the right way.
- Rather, we put data into the neural net, compare the function it represents with the one the data implicitly represents, and update the parameters.
- Neural networks are built hierarchically, and plug together like Lego blocks.
- In particular, neural nets are built compositionally.
  - Depth = levels of compositionality.

Neural Network

E.g., the most advanced GPT-3 model was trained in a model with 170B parameters and 96 layers.

Neural Network

Neural Netowrok Playground (https://playground.tensorflow.org/)

Neural Network

Noncovex optimization
- Nonlinear loss function in high-dimensional parameter space
- Gradient descent
- Automatic differentiation

Transformer

A type of neural network architecture designed to handle sequential data efficiently, particularly in natural language processing (NLP).
“Attention is All You Need” by Vaswani et al. (2017)
- Self-attention allows the model to weigh the importance of different words in a sentence: \text{Attention}(\underbrace{Q}_{\text{queries}}, \underbrace{K}_{\text{keys}}, \underbrace{V}_{\text{values}}) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V
  - E.g., “The cat sat on the mat because it was tired.”
- Multiple attention heads run in parallel to capture different aspects of the relationships between words.

Training and Tuning

Training
- Generative models such as LLM are trained on massive datasets.
- E.g. GPT-3 was trained on a data the Common Crawl (410B), WebText2 (19B), Books1 (12B), Books2 (55B), and Wikipedia (3B).
- Unsupervied learning without explicit labels, typically predicting the next word in a sentence (language modeling) or reconstructing corrupted input (masked language modeling).
- This process is now often referred as pretraining.

Training and Tuning

Fine-tuning
- Fine-tuning involves taking a pretrained model and further training it on a smaller, task-specific dataset.
- Transfer learning: use the pretrained model as a base, retaining its learned knowledge. Train on a new dataset with supervised learning to adjust the model’s weights.
- Fine-tune a pretrained language model (e.g. Llama) on domain-specific (e.g. political, medical, legal, etc.) texts to improve its performance in generating domain-specicif texts.

Training and Tuning

Reinforcement learning
- An agent learn an optimal or nearly-optimal policy that maximizes the “reward function.”
- ChatGPT uses Reinforcement Learning from Human Feedback (RLHF), which uses human feedback in the training loop to improve performance—mimic human preferences.

LLM In a Nutshell

LLM is designed and trained to generate text based on patterns and associations learned from vast datasets.
Essentially, LLMs generate text by predicting the next word in a sequence based on the context of the preceding words.
- E.g. Input: “The cat sat on the;” Output: “mat because it was tired.”
Advantage
- Contextual understanding
Challenges:
- Incorrect information
- Context limitation
- Computational resources
- Copyrights and ethical concerns

A Big Question

Is (generative) artificial intellegence really intellegent?
The debate:
- A “stochastic parrot”
- “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

“… AI language models aren’t just predicting the next symbol, they’re actually reasoning and understanding in the same way we are, and they’ll continue improving as they get bigger…”

AI for Simulation

Goal: To use AI for simulating human-like behavior and complex systems.
Generative content: the model is trained on extensive real-world data.
Contextual understanding: it knows context better due to the features of models and large training set.
Complexity: it can captures and simulates a wide range of human behaviors and complex scenarios
Domain transferability: It can transfers knowledge from one domain to another based on complex associations.
Applications: social behavior, political orientations, economic activities, public health, and more.

Public Opinion and Political Attitudes

Argyle, Lisa P. et al. 2023. “Out of One, Many: Using Language Models to Simulate Human Samples.” Political Analysis.31(3): 337–351.
Comparing GPT-3 generated silicon sample with human survey responses.
Generating synthetic prompts by constructing a first-person backstory based human survey profiles

Example Prompt

Ideologically, I describe myself as liberal. Politically, I am a strong Republican. Racially, I am white. I am male. Financially, I am upper-class. In terms of my age, I am young. When I am asked to write down four words that typically describe people who support the Demogratic Party. I respond with:

Public Opinion and Political Attitudes

The similarityy between silicon and human samples is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes

Public Opinion and Political Attitudes

Bisbee, James, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, et al. 2024. “Synthetic Replacements for Human Survey Data? The Perils of Large Language Models.” Political Analysis. 1–16.
- The average scores generated by ChatGPT correspond closely to the averages in our baseline survey.
- Less variation and more extreme attitudes in responses than in the real surveys.
Santurkar, Shibani et al. 2023. “Whose Opinions Do Language Models Reflect?” In Proceedings of the 40th International Conference on Machine Learning.
- GPT generated sample represent some demographic groups better than others.

Other Applications

Economic activities and user behavior (Horton, 2023; Weiss, 2024; Wang et al. 2024; Li, Zhang, & Sun, 2023)
Epidemic disease (Ross et al., 2023)
International conflict (Hua et al. 2024)
Social network (Gao et al., 2023; Törnberg et al., 2023)
- Törnberg et al. use ANES data to simuate a social media environment.
Reasoning & decision making (Cheng and Chin, 2024; Webb, Holyoak, & Lu, 2023; Wang, Chiu, & Chiu, 2023)

Reflections

Bias propagation: Simulations can perpetuate and amplify biases and sterotypes present in the training data.
Understanding vs. mimicking: Does the AI-generated content reflects human thinking and understanding?
Overfitting & repetitiveness: AI tend to generate repetitive or circular content, lacking diversity and innovation in responses.
Static nature: Can AI predict unseen scenarios that are not covered in the training data?
Continous learning: Do we want AI to learn from the prompts/data we feed it?
- Zero-shot vs. few-shot vs. fine tuning

Bias in Generative AI

AI has its bias inherent from the training data (and also possible algorithm design and user interactions)

Bias in Generative AI

AI has its bias inherent from the training data (and also possible algorithm design and user interactions)

Bias in the Context of Simulation

AI has valules, personality, etc. (Yao et al., 2023; Miotto, Rossberg, & Kleinberg 2022)

GPT is WEIRD

Atari et al. (2023) shows that GPT performs cognitive psychological tasks most resembles that of people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies.

Bias in the Context of Simulation

To use AI for simulation tasks, we want it to reveal existing differences but not to amplify the bias.

AI for Verification

AI-assisted verification:
- Using AI tools to verify data, models, and processes in various domains.
- AI automates repetitive and time-consuming verification tasks, increasing efficiency and scalability.
- AI models can reach high-precision on par with conventional approaches.
- Example: Medical diagnoses, fraud detection, etc.
Verification of AI-generated content
- Ensuring the accuracy, reliability, and ethical compliance of content generated by AI models.
- To verify the information produced by AI is factually accurate.
- To verify the content generated by AI is reliable and consistent.

AI-Assisted Verification

Use AI to process text document

Issues of AI-Assisted Verification

Black-box models and difficult in assessing the processes
Fine-tuning for specific domain of data
- Generalibility
Verifying large amount of materials can be costly.
- Economical fine-tuning models: e.g. Low-Rank Adaptation (LoRA) (Hu et al. 2021)

Verification of AI-Generated Content

AI can be wrong: AI-generated content is not always accurate or reliable.
- If we want to use AI-generated information, we need to verify the accuracy and reliability.

Example: Verifying AI-Generated Measurement

Measuring ideologies of legislators
- DW-NOMINATE : legislators vote together are closer.

Example: Verifying AI-Generated Measurement

Problem:

Alternatives: Perceived ideology (Hopkins and Noel, 2022), campaign finance scores (Bonica, 2016).

Example: Verifying AI-Generated Measurement

What about using texts (and context)
Wu et al. (2023) use LLM scale the ideologies of politicians
- Asking ChatGPT to place politicians.

Prompt

Which senator is more liberal/conservative: (Senator1) or (Senator2)?

What if AI is wrong?

Verifying AI-Generated Measurement

Monte Carlo method
- Run the model multiple times with the same or slightly varied inputs.
- Analyze the distribution of the outputs to measure uncertainty.
Bootstrap method
- Repeatedly sample from the dataset with replacement to create multiple new datasets.
Prior iniformation
- To improve the accuracy of the output performance
Caveats:
- If the AI model is highly uncertain, it can have different implications.

AI for Interactions

Many social science phenomena involve interactions but we rarely study interactions directly.
Human-human interactions
- Use AI to analyze conversation, social networks, and group dynamics.
- Use AI to create interactive environment.

AI for Interactions

Simulate and analyze human-human interactions using LLM
Studying diverse behaviors and dynamics in controlled environments
- Bahaviorial dynamics (Mathur et al., 2024), collabaration (Li, Zhang, & Sun, 2023), social norms (Liu et al. 2023), etc.

AI for Interactions

Experimental research on human interactions is becoming increasingly popular but it is costly and limited.
- Santoro, Erik, and David E. Broockman. 2022. “The promise and pitfalls of cross-partisan conversations for reducing affective polarization: Evidence from randomized experiments.” Science Advances. 8(25).
- Combs, Aidan et al. 2023. “Reducing political polarization in the United States with a mobile chat platform.” Nature Human Behaviour. 7(9): 1454–1461.

AI for Interactions

AI-powered experiment research
- AI can replace human participants on one side of an interaction to simulate and study social behaviors.
Benifits:
- Reduces costs and logistical complexities.
- Enables controlled and consistent interactions.
- Scale the experiment to include larger samples and test more scenarios.
- Combines with other developments in experimental methods: e.g., adaptive design.
  - Rosenzweig, Leah R., and Molly Offer-Westort. 2022. “Conversations with a concern-addressing chatbot increase COVID-19 vaccination intentions among social media users in Kenya and Nigeria.”
- AI can generate different types of content: text, image, audio, video, etc.

AI for Interactions

Argyle, Lisa P. et al. 2023. “Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale.” PNAS. 120(41).
- An AI chat assistant providing real-time, evidence-based suggestions for messages during divisive online political conversations.
- The AI chat assistant successfully moderate the only chat in terms of improving participants’ perceived quality of conversation.
- Participants’ policy positions remained unchanged by the AI intervention, indicating no manipulation.

Conclusion

AI tools for social science is still in a rudimenatary stage.
- Most works involve fine-tuning AI models or testing AI performance in social science contexts.
Current use of AI should primarily be as a copilot, assisting researchers on well-designed tasks.
- This requires human oversight on AI tasks.
- It’s critical to constantly verify AI-generated content.

“Painting Didn’t Die”

Upon seeing the first daguerreotype around 1840, the French painter Paul Delaroche (1797-1856), declared: “From this moment, painting is dead.”
This led to a cultural and moral panic about the future of labor, the industry of realistic painting, and the fate of art and human creativity itself.
Painting did not die that day. What followed was, in fact, the most shining era of the arts: the birth of abstract expressionism, impressionism, cubism and modern art in the 1860s.
Photography also became a form of art later.
So, the driving force behind the evolution of methods and its resilience against technological advancements is creativity.