Programming basics for Biostatistics 6099

Basics of Python programming (part 2)

Zhiguang Huo (Caleb)

Thursday Oct 12th, 2023

Outlines

Control flows
Loops
Basic string operators
file operations
exceptions
datetime
more on python data structure
- list
- dictionary
- tuple

Control flows

num = input("Enter a number: ")
if int(num) > 0:
    print(f"{num} is positive")

colon(:) at the end of the if line
indent (e.g., 2 (or 4) whitespaces) before the chunk of code to be executed
In Python if control folow, we don’t have parentheses like in R. Indentation is used to determine the end of the code chunk.

Indentation

Indentation serves another purpose other than code readability
Python treats the statements with the same indentation level (statements with an equal number of whitespaces before them) as a single code block.
Commonly used indent
- 2 whitespaces
- 4 whitespaces
- 1 tab (not recommended)
This rule of identation is used for flow control, loops, functions etc.

if else elif

num = input("Enter a number: ")
if int(num) > 0:
    print(f"{num} is positive")
else:
    print(f"{num} is not positive")

num = input("Enter a number: ")
if int(num) > 0:
    print("The number is positive")
elif int(num) < 0:
    print("The number is negative")
else:
    print("The number is zero")

if else same line

original

number = input("Please enter a number: ")
if int(number) % 2 == 0:
    print("even")
else:
    print("odd")

one line version

number = input("Please enter a number: ")
print("even") if int(number) % 2 == 0 else print("odd")

True or False conditions

1>2

## False

4 == 5

## False

"ap" in "apple"

## True

"apple" in ["apple", "orange"]

## True

True and False

## False

True or False

## True

not True

## False

True or False conditions

we could use < (or >) to connect a series of comparisons

a = 4
b = 6
c = 9

a < b and b < c

## True

a < b < c

## True

a < c > b

## True

match (available for python >= 3.10)

To select one cases from multiple choices

status = 400    

match status:
    case 400:
        print("Bad request")
    case 404:
        print("Not found")
    case 418:
        print("I'm a teapot")
    case _:
        print("Something's wrong with the internet")

## Bad request

for loops

words = ["cat", "dog", "gator"]
for w in words:
     print(w)

## cat
## dog
## gator

words = ["cat", "dog", "gator"]
for w in words:
     print(f"{w} has {len(w)} letters in it.")

## cat has 3 letters in it.
## dog has 3 letters in it.
## gator has 5 letters in it.

range() function

for i in range(3):
    print(i)

## 0
## 1
## 2

range(n) creates an iterable object
list(iterable) convert an iterable to a list
The range(n) is exclusive, it doesn’t include the last number n.
It creates the sequence of numbers from start to stop -1.
For example, list(range(5)) will produce [0, 1, 2, 3, 4]

list(range(3))

## [0, 1, 2]

list(range(3,7))

## [3, 4, 5, 6]

range() function

range with step size rather than 1.

list(range(3,8,2))

## [3, 5, 7]

list(range(7,2,-2))

## [7, 5, 3]

range over a list

words = ["cat", "dog", "gator"]
for i in range(len(words)):
     print(i, words[i])

## 0 cat
## 1 dog
## 2 gator

break

for num in range(1, 10):
  if num % 5 == 0:
    print(f"{num} can be divided by 5")
    break
  print(f"{num} cannot be divided by 5")

## 1 cannot be divided by 5
## 2 cannot be divided by 5
## 3 cannot be divided by 5
## 4 cannot be divided by 5
## 5 can be divided by 5

continue

for num in range(1, 10):
  if num % 5 == 0:
    continue
  print(f"{num} cannot be divided by 5")

## 1 cannot be divided by 5
## 2 cannot be divided by 5
## 3 cannot be divided by 5
## 4 cannot be divided by 5
## 6 cannot be divided by 5
## 7 cannot be divided by 5
## 8 cannot be divided by 5
## 9 cannot be divided by 5

pass

In python, pass is the null statement.
It is just a placeholder for the functionality to be added later.
Pass does nothing.

sequence = {'p', 'a', 's', 's'}
for val in sequence:
    pass

a = 33
b = 200

if b > a:
  pass

while loop

num = 1
while num<10:
  if num % 5 == 0:
    print(f"{num} can be divided by 5")
    break
  print(f"{num} cannot be divided by 5")
  num+=1

## 1 cannot be divided by 5
## 2 cannot be divided by 5
## 3 cannot be divided by 5
## 4 cannot be divided by 5
## 5 can be divided by 5

num = 0
while num<10:
  num+=1
  if num % 5 == 0:
    continue
  print(f"{num} cannot be divided by 5")

## 1 cannot be divided by 5
## 2 cannot be divided by 5
## 3 cannot be divided by 5
## 4 cannot be divided by 5
## 6 cannot be divided by 5
## 7 cannot be divided by 5
## 8 cannot be divided by 5
## 9 cannot be divided by 5

Basic string operators

find
index
count
join
split
lower
upper
title
replace
strip

find

title = "I love programming basics for Biostatistics!"
title.find("I")

## 0

title.find("love")

## 2

title.find("o")

## 3

title.find("o", 4) ## starting searching index is 4

## 9

title.find("XX")

## -1

find() and index() are identical except when not found
- find() produces -1
- index() produces an error

title.index("love")
title.index("XX")

pattern detection

title = "I love programming basics for Biostatistics!"
"love" in title

## True

"computing" in title

## False

"XX" in title

## False

title.endswith("computing!")

## False

title.startswith("I love")

## True

title.count("l")

## 1

join

seq = ["1", "2", "3", "4", "5"]
sep = "+"
sep.join(seq)

## '1+2+3+4+5'

"".join(seq)

## '12345'

dirs =( "", "usr", "bin", "env")
"/".join(dirs)

## '/usr/bin/env'

sep = "+"
print("C:" + "\\".join(dirs)) ## single \ has special meaning: treating special symbol as regular symbol

## C:\usr\bin\env

split

reverse operator of join.

longSeq = "1+2+3+4+5"
longSeq.split("+")

## ['1', '2', '3', '4', '5']

longSeq.split("3")

## ['1+2+', '+4+5']

"Using the default value".split()

## ['Using', 'the', 'default', 'value']

lower, upper, title

sentence = "I like programming basics for Biostatistics!"
sentence.lower()

## 'i like programming basics for biostatistics!'

sentence.upper()

## 'I LIKE PROGRAMMING BASICS FOR BIOSTATISTICS!'

sentence.title()

## 'I Like Programming Basics For Biostatistics!'

sentence.islower()
sentence.isupper()
sentence.istitle()

strip

removes any leading (whitespace at the beginning) and trailing (whitespace at the end) characters.
whitespace is the default leading character to remove
internal whitespace is kept

a = "   internal   whitespace is kept     "
a.strip()

## 'internal   whitespace is kept'

b = "*** SPAM * for * everyone!!! ***"
b.strip(" *!")

## 'SPAM * for * everyone'

also works for new line

c = "\na\nb\n\n\nc\n\n"
c.strip()

## 'a\nb\n\n\nc'

replace

a = "This is a cat!"
a.replace("This", "That")

## 'That is a cat!'

a.replace("is", "eez")

## 'Theez eez a cat!'

file operation (read)

https://caleb-huo.github.io/teaching/data/misc/my_file.txt

open file, display, and close (release memory)

file = open("my_file.txt")
contents = file.read()
print(contents)

## Hello, my name is Caleb. Hello World!
## I like computing

file.close()

Alternative approach without closing step

with open("my_file.txt") as file:
    contents = file.read()
    print(contents)

## Hello, my name is Caleb. Hello World!
## I like computing

file operation (read)

readlines()
- read multilple lines,
- save the result in a list
  - each element of the list contains a line

myfile = "my_file.txt"
with open(myfile) as file:
    lines = file.readlines()

for aline in lines:
    print(aline.strip())

## Hello, my name is Caleb. Hello World!
## I like computing

file operation (write)

write to file (overwrite original file)

with open("new_file.txt", mode="w") as file:
    file.write("I like python!")

## 14

append to file (append at the end of the original file)

with open("new_file.txt", mode="a") as file:
    file.write("We like python!")

## 15

Exceptions

FileNotFound

with open("a_file.txt") as file:
    file.read()

IndexError

fruit_list = ["Apple", "Banana", "Pear"]
fruit_list[3]

TypeError

text = "abc"
print(text + 5)

raise an error

raise TypeError("This is an error that I made up!")

Handle exceptions

The errors (exceptions) are handled by except.
The program will keep executing.

try:
    file = open("a_file.txt")
    print(1 + "2")
except FileNotFoundError:
    print("Catch FileNotFoundError")
except TypeError as error_message:
    print(f"Here is the error: {error_message}.")
else:
    content = file.read()
    print(content)
finally: ## will happen no matter what happens
    file.close()
    print("File was closed.")

## Catch FileNotFoundError
## File was closed.

datetime

The datetime module supplies classes for manipulating dates and times.

import datetime as dt
now = dt.date.today() ## date only
now.year

## 2023

now.month

## 10

now.day

## 12

# now.weekday()

birthday = dt.date(1995, 7, 31)
age = now - birthday
age.days

## 10300

datetime

now = dt.datetime.now()
now.year

## 2023

now.month

## 10

now.day

## 12

now.hour

## 13

now.minute

## 18

now.second

## 59

now.microsecond

## 756676

now.weekday()

## 3

datetime

now = dt.datetime.now()

print(f'{now:%Y-%m-%d %H:%M}')

## 2023-10-12 13:18

more on python data structure

list
dictionary
tuple

list: creation and assignment

create a list from a string

list("Hello")

## ['H', 'e', 'l', 'l', 'o']

create a list directly

x = [1, 1, 1]
x

## [1, 1, 1]

change element

x = [1, 2, 3]
x[1] = 0
x

## [1, 0, 3]

list: deletion and slice assignment

names = ["Alice", "Beth", "Carl", "Dan", "Emily"]
names

## ['Alice', 'Beth', 'Carl', 'Dan', 'Emily']

del names[2]
names

## ['Alice', 'Beth', 'Dan', 'Emily']

slice assignment

names = list("Lucas")
names[3:] = list("ky")
names

## ['L', 'u', 'c', 'k', 'y']

"".join(names)

## 'Lucky'

slice assignment can be unequal length

names = list("Lucas")
names[1:] = list("emonade")
"".join(names)

## 'Lemonade'

slice assignment can be used as insertion or deletion

numbers = [1, 5]
numbers[1:1] = [2, 3, 4]
numbers

## [1, 2, 3, 4, 5]

numbers = list(range(1,6))
numbers

## [1, 2, 3, 4, 5]

numbers[1:4] = []
numbers

## [1, 5]

list: append and count

append

alist = [0,1,2]
alist.append(3)
alist

## [0, 1, 2, 3]

count

asentence = "to be or not to be"
alist = asentence.split()
alist.count("to")

## 2

x = [[1,2], 1, 2, 1, [2, 1, [1,2]]]
x.count(1)

## 2

x.count([1,2])

## 1

list: extend

extend (recommended for efficiency and readibility)

a = [0,1,2]; b = [3,4,5]
a.extend(b)
a

## [0, 1, 2, 3, 4, 5]

direct +

a = [0,1,2]; b = [3,4,5]
a + b

## [0, 1, 2, 3, 4, 5]

## [0, 1, 2]

slice assignment

a = [0,1,2]; b = [3,4,5]
a[len(a):] = b
a

## [0, 1, 2, 3, 4, 5]

list: index

index: will return the first match

asentence = "to be or not to be"
alist = asentence.split()
alist

## ['to', 'be', 'or', 'not', 'to', 'be']

alist.index("to")

## 0

alist.index("not")

## 3

alist[3]

## 'not'

alist.index("XX")

list: insert

insert

alist = [1,2,3,5,6]
alist.insert(3, "four")
alist

## [1, 2, 3, 'four', 5, 6]

insert with slice assignment

alist = [1,2,3,5,6]
alist[3:3] = ["four"]
alist

## [1, 2, 3, 'four', 5, 6]

list: pop

pop: return the last element of a list
- opposite of append

x = list(range(10))
x.pop()

## 9

## [0, 1, 2, 3, 4, 5, 6, 7, 8]

x.pop()

## 8

## [0, 1, 2, 3, 4, 5, 6, 7]

list: remove

remove

asentence = "to be or not to be"
alist = asentence.split()
alist

## ['to', 'be', 'or', 'not', 'to', 'be']

alist.remove("to")
alist

## ['be', 'or', 'not', 'to', 'be']

alist.remove("XX")

compare pop and remove
- remove has no return value, and remove the first appearance of certain value
- pop has return value, and pop up the last element of a list

list: reverse and sort

reverse

x = ["a", "b", "c"]
x.reverse()
x

## ['c', 'b', 'a']

sort: sort method has no return value (in-place operator)

x = [5, 3, 4]
x.sort() 
x

## [3, 4, 5]

y = ["b", "c", "a"]
y.sort()
y

## ['a', 'b', 'c']

no return values:

x = [5, 3, 4]
y = x.sort() 
print(y)

## None

with return values:

x = [5, 3, 4]
y = sorted(x) 
print(y)

## [3, 4, 5]

list: sort

sort

x = [5, 3, 4]
y = x ## x and y are pointing to the same list
y.sort() 

print(x)

## [3, 4, 5]

print(y)

## [3, 4, 5]

x = [5, 3, 4]
y = x[:] ## y is a slice assignment of x, thus a new variable
y.sort() 

print(x)

## [5, 3, 4]

print(y)

## [3, 4, 5]

references and values

list: sort

sort

x = ["aaa", "bb", "cccc"]
x.sort(key = len) 
x

## ['bb', 'aaa', 'cccc']

x = [5, 3, 4]
x.sort(reverse = True) 
print(x)

## [5, 4, 3]

dictionary: basic operator

basic operator

phonebook = {"Alice": 2341, 
            "Beth": 4971,
            "Carl": 9401
}
phonebook

## {'Alice': 2341, 'Beth': 4971, 'Carl': 9401}

len(phonebook)

## 3

phonebook["Beth"]

## 4971

dictionary: update and delete

update and delete

phonebook["Alice"] = 1358
phonebook

## {'Alice': 1358, 'Beth': 4971, 'Carl': 9401}

adict = {"Alice": 9572}
phonebook.update(adict)
phonebook

## {'Alice': 9572, 'Beth': 4971, 'Carl': 9401}

del phonebook["Carl"]
"Beth" in phonebook

## True

dictionary: clear

clear

d = {}
d['name'] = "Amy"
d['age'] = 24
d

## {'name': 'Amy', 'age': 24}

d.clear()
d

## {}

why clear is useful

x = {}
y = x
x['key'] = 'value'
y

## {'key': 'value'}

x = {} ## now x points to a new value {}
y ## y points to the original value {'key': 'value'}

## {'key': 'value'}

x = {}
y = x
x['key'] = 'value'
y

## {'key': 'value'}

x.clear() ## clear the value x points to
y ## y still points to what x points to

## {}

references and values (part 2)

copy

shallow copy
- only the reference address of the object is copied

d = {}
d['username'] = "admin"
d['machines'] = ["foo", "bar"]
d

## {'username': 'admin', 'machines': ['foo', 'bar']}

c = d.copy()
c['username'] = "Alex" ## c['username'] points to a new value
print(c)

## {'username': 'Alex', 'machines': ['foo', 'bar']}

print(d)

## {'username': 'admin', 'machines': ['foo', 'bar']}

c['machines'].remove("bar") ## references don't change, the underlying values are changed.
print(c)

## {'username': 'Alex', 'machines': ['foo']}

print(d)

## {'username': 'admin', 'machines': ['foo']}

references and values (shallow copy)

copy

deep copy:
- will make a new copy of the values

from copy import deepcopy

d = {}
d['username'] = "admin"
d['machines'] = ["foo", "bar"]
d

## {'username': 'admin', 'machines': ['foo', 'bar']}

c = d.copy()
dc = deepcopy(d)
d['machines'].remove("bar") 
print(c)

## {'username': 'admin', 'machines': ['foo']}

print(dc)

## {'username': 'admin', 'machines': ['foo', 'bar']}

references and values (deep copy)

dictionary initialization: fromkeys

create keys for an empty dictionary.

{}.fromkeys(["name", "age"])

## {'name': None, 'age': None}

create keys for a dictionary

dict.fromkeys(["name", "age"])

## {'name': None, 'age': None}

set default values

dict.fromkeys(["name", "age"], "unknown")

## {'name': 'unknown', 'age': 'unknown'}

dictionary: get

get method is more flexible
get is the same as indexing by keys when the key exists

d = {"name": "Amy", "age": 24}
d["name"]

## 'Amy'

d.get("name")

## 'Amy'

get will return None when the key doesn’t exist

d["XX"]
d.get("XX")
d.get("XX", "No exist") ## set your own return value for get

dictionary: items

items() return all items of the dictionary

phonebook = {"Alice": 2341, 
            "Beth": 4971,
            "Carl": 9401
}
phonebook

## {'Alice': 2341, 'Beth': 4971, 'Carl': 9401}

phonebook.items() ## this is an iterable

## dict_items([('Alice', 2341), ('Beth', 4971), ('Carl', 9401)])

list(phonebook.items())

## [('Alice', 2341), ('Beth', 4971), ('Carl', 9401)]

dictionary: loops

can be used for looping a dictionary

it = phonebook.items()
for key, value in it:
    print(key +  "--> " + str(value))

## Alice--> 2341
## Beth--> 4971
## Carl--> 9401

if you only want the value, not the keys

it = phonebook.items()
for _, value in it:
    print(str(value))

## 2341
## 4971
## 9401

use key to iterate a dictionary for a loop

for key in phonebook:
    print(key +  "--> " + str(phonebook[key]))

## Alice--> 2341
## Beth--> 4971
## Carl--> 9401

use values() method

phonebook.values() ## this is an iterable

## dict_values([2341, 4971, 9401])

list(phonebook.values())

## [2341, 4971, 9401]

for i in phonebook.values():
    print(i)

## 2341
## 4971
## 9401

dictionary: pop and popitem

pop

phonebook = {"Alice": 2341, 
            "Beth": 4971,
            "Carl": 9401
}
phonebook.pop("Alice")

## 2341

phonebook

## {'Beth': 4971, 'Carl': 9401}

popitem(): pop up the last item

phonebook = {"Alice": 2341, 
            "Beth": 4971,
            "Carl": 9401
}
phonebook.popitem()

## ('Carl', 9401)

phonebook

## {'Alice': 2341, 'Beth': 4971}

tuple: review basics

atuple = (0,1,2)
atuple += (3,4,5)
atuple

## (0, 1, 2, 3, 4, 5)

btuple = (0, 1, 1, ['I', 'like',  'python'])
btuple[3][0] = 'You'
print(btuple)

## (0, 1, 1, ['You', 'like', 'python'])

print(btuple.count(1))

## 2

print(btuple.index(['You', "like", 'python']))

## 3

Programming basics for Biostatistics 6099

Basics of Python programming (part 2)

Outlines

Control flows

Indentation

if else elif

if else same line

True or False conditions

True or False conditions

match (available for python >= 3.10)

for loops

range() function

range() function

break

continue

pass

while loop

Basic string operators

find

pattern detection

join

split

lower, upper, title

strip

replace

file operation (read)

file operation (read)

file operation (write)

Exceptions

Handle exceptions

datetime

datetime

datetime

more on python data structure

list: creation and assignment

list: deletion and slice assignment

list: append and count

list: extend

list: index

list: insert

list: pop

list: remove

list: reverse and sort

list: sort

references and values

list: sort

dictionary: basic operator

dictionary: update and delete

dictionary: clear

why clear is useful

references and values (part 2)

copy

references and values (shallow copy)

copy

references and values (deep copy)

dictionary initialization: fromkeys

dictionary: get

dictionary: items

dictionary: loops

dictionary: pop and popitem

tuple: review basics

Reference