Background

GPT-2 (Second Generation):
- Parameters, which had 1.5 billion parameters.
GPT-3 (Third Generation):
- Parameters: GPT-3 contains 175 billion parameters, which are the parts of the model learned from training data.
- Training Data: GPT-3 was trained on a diverse range of internet text available up to October 2019.
GPT-4 (Fourth Generation):
- Parameters: GPT-4 is said (leaked info) to be based on eight models with 220 billion parameters each, for a total of about 1.76 trillion parameters.
- Training Data: Like GPT-3, GPT-4 was trained on a vast dataset of internet text. However, it likely includes a more extensive and updated dataset, encompassing a wider range of sources and possibly more recent data, to improve its understanding and generation capabilities.

Model intuition behind GPT

https://towardsdatascience.com/language-models-gpt-and-gpt-2-8bdb9867c50a

Large language models

generated by chatGPT

Large language models

OpenAI’s GPT Series:
- GPT-3 and GPT-4: These are part of OpenAI’s Generative Pre-trained Transformer series, known for their ability to generate human-like text.
- Codex: This model, which powers GitHub Copilot, is specialized in understanding and generating code in various programming languages.
Google’s Models:
- BERT (Bidirectional Encoder Representations from Transformers): It’s designed to understand the context of a word in a sentence by looking at the words that come before and after it. BERT has been a foundation for many subsequent models and applications.
- LaMDA (Language Model for Dialogue Applications): It’s optimized for generating more conversational and sensical responses in dialogues.
Meta’s Models:
- OPT (Open Pre-trained Transformer): A series of models, including a large-scale one, released as an open-source alternative, focusing on ethical and responsible AI development.
- BART and RoBERTa: These models are variations of the transformer architecture with specific design choices to improve performance on certain tasks.
EleutherAI’s GPT-Neo and GPT-J: These are open-source alternatives to OpenAI’s GPT-3, aiming to democratize access to large language models.

OPENAI API

create a new environment and install openai package

#pip install --upgrade openai

obtain an openai API key

https://platform.openai.com/api-keys

Minimum example to make OPENAI API calls

from openai import OpenAI
client = OpenAI(api_key='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')
## there are better ways to do this by setting environment variables: See https://platform.openai.com/docs/quickstart?context=python

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "Tell me a joke!"}
  ]
)

print(completion.choices[0].message.content)

## Sure, here you go:
## 
## Why don't scientists trust atoms?
## 
## Because they make up EVERYTHING!

OPENAI models (as 11/28/2023)

https://platform.openai.com/docs/models

text and chat:
- gpt-4
- gpt-3.5-turbo
image: generate and edit images given a natural language prompt
- dall-e-3
text + image:
- gpt-4-vision-preview
audio: can convert audio into text
- whisper-1
speech: can convert text to speech
- tts-1
many others

Pricing (as 11/28/2023)

https://openai.com/pricing

Free tier (for 3 months)

RPM: requests per minute
RPD: requests per day
TPM: tokens per minute

What is token?

Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens.

1 token ~= 4 chars in English
1 token ~= ¾ words
100 tokens ~= 75 words

More examples on chat and text

system: system level instruction
user: user input

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "I am a biostatistician, I am learning programming using python."},
    {"role": "user", "content": "Tell me a joke!"}
  ]
)

print(completion.choices[0].message.content)

## Sure, here's a programming joke for you:
## 
## Why do programmers prefer dark mode?
## 
## Because the light attracts bugs!

more options for client.chat.completions.create

?client.chat.completions.create

model: model specification
seed: set a random seed to guarantee reproducibility
temperature: This parameter (usually between 0.1 to 2.0) controls the randomness of the generated text.
- A lower temperature makes the model more deterministic, leading to more predictable and conservative responses.
- On the other hand, a higher temperature increases randomness, encouraging the generation of more diverse and creative outputs.
top_p: (typically ranges from 0.1 to 1.0) controls the scope of tokens the model considers for the next word in the generated text
- Setting a lower top_p value leads to a narrower selection of tokens, ensuring higher probability words are chosen, potentially making the text more focused and coherent.
- a higher top_p value broadens the token selection, enabling more diverse and sometimes less coherent responses.

play with temperature/top_p for chat completion

completion = client.chat.completions.create(
  model="gpt-3.5-turbo",
  temperature = 1.5,
  messages=[
    {"role": "user", "content": "Tell me a joke!"}
  ]
)

print(completion.choices[0].message.content)

## Why don't scientists trust atoms? 
## 
## Because they make up everything!

Examples on image generation

response = client.images.generate(
  model="dall-e-3",
  prompt="an orange cat driving a car in space",
  size="1024x1024",
  quality="standard",
  n=1,
)

image_url = response.data[0].url
image_url

## 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-Qvud1DXqJ6sIny1VfoCPNQAc/user-3qbiZ0O2GBsVFtX6h9jTtfYn/img-UWL9BkiGcPei5FkKz3vXkpL7.png?st=2023-11-28T14%3A16%3A29Z&se=2023-11-28T16%3A16%3A29Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-11-27T20%3A49%3A34Z&ske=2023-11-28T20%3A49%3A34Z&sks=b&skv=2021-08-06&sig=/5i1GUqWQUH3PKlNMq5Y47Qhv6kD/NWfScQ5JplKp6E%3D'

Diplay an image in jupyter notebook

from IPython.display import display, Image
display(Image(url=image_url))

## <IPython.core.display.Image object>

## (-0.5, 1023.5, 1023.5, -0.5)

Text to speech

response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is a wonderful day to learn programming basics for Biostatistics!"
)

speech_file_path = '/Users/zhuo/Desktop/speech.mp3'
response.stream_to_file(speech_file_path)

from IPython.display import Audio
Audio(response.read(), autoplay=True)

try other voices

?client.audio.speech.create

Text to speech

aslo support other languages

response = client.audio.speech.create(
  model="tts-1",
  voice="echo",
  input="今天是学习生物统计学编程基础的好日子！"
)
Audio(response.read(), autoplay=True)

Speech to text

speech_file_path = '/Users/zhuo/Desktop/speech.mp3'
audio_file= open(speech_file_path, "rb")

transcript = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)

transcript.text

microphone to text

You may want to open jupyter notebook from terminal with admin account

open terminal with a proper environment
open jupyter notebook from terminal with admin (otherwise access may be denied)

/Users/zhuo/anaconda3/envs/py311/bin/jupyter_mac.command --allow-root

microphone to text

import pyaudio
import wave

# Audio recording parameters
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = '/Users/zhuo/Desktop/speech2.mp3'

audio = pyaudio.PyAudio()

# Start recording
stream = audio.open(format=FORMAT, channels=CHANNELS,
                    rate=RATE, input=True,
                    frames_per_buffer=CHUNK)
print("Recording...")
frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)
print("Finished recording.")

# Stop recording
stream.stop_stream()
stream.close()
audio.terminate()

# # Save the recorded data as a WAV file

wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(audio.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

Translate the audio to text

speech_file_path = '/Users/zhuo/Desktop/speech2.mp3'
audio_file= open(speech_file_path, "rb")

transcript = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)

transcript.text

Programming basics for Biostatistics 6099

OPENAI API basics

Outline

Background

Model intuition behind GPT

Large language models

Large language models

OPENAI API

Minimum example to make OPENAI API calls

OPENAI models (as 11/28/2023)

Pricing (as 11/28/2023)

What is token?

More examples on chat and text

more options for client.chat.completions.create

play with temperature/top_p for chat completion

Examples on image generation

Diplay an image in jupyter notebook

Text to speech

Text to speech

Speech to text

microphone to text

microphone to text

Translate the audio to text

Application Case Studies

references