Lesson 4 Strings
Pragmatic AI Labs
This notebook was produced by Pragmatic AI Labs. You can continue learning about these topics by:
- Buying a copy of Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Reading an online copy of Pragmatic AI:Pragmatic AI: An Introduction to Cloud-Based Machine Learning
- Watching video Essential Machine Learning and AI with Python and Jupyter Notebook-Video-SafariOnline on Safari Books Online.
- Watching video AWS Certified Machine Learning-Speciality
- Purchasing video Essential Machine Learning and AI with Python and Jupyter Notebook- Purchase Video
- Viewing more content at noahgift.com
4.1 Use string methods
String Quoting
Single quotes
'Here is a string'
'Here is a string'
Double quotes
"Here is a string" == 'Here is a string'
True
Triple Strings
a_very_large_phrase = """
Wikipedia is hosted by the Wikimedia Foundation,
a non-profit organization that also hosts a range of other projects.
"""
print(a_very_large_phrase)
Wikipedia is hosted by the Wikimedia Foundation,
a non-profit organization that also hosts a range of other projects.
Raw Strings
jon_jones = '...wrote on twitter he is the greatest "heavyw8e! \nfighter of all time'
print(jon_jones)
...wrote on twitter he is the greatest "heavyw8e!
fighter of all time
jon_jones = r'...wrote on twitter he is the greatest "heavyw8e! \nfighter of all time'
print(jon_jones)
...wrote on twitter he is the greatest "heavyw8e! \nfighter of all time
Case Manipulation
captain = "Patrick Tayluer"
captain
'Patrick Tayluer'
captain.capitalize()
'Patrick tayluer'
captain.lower()
'patrick tayluer'
captain.upper()
'PATRICK TAYLUER'
captain.swapcase()
'pATRICK tAYLUER'
captain = 'patrick tayluer'
captain.title()
'Patrick Tayluer'
Interrogation
river = 'Mississippi'
len(river)
11
river.count('s')
4
river.index('pp')
8
river.index('r')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-67-fcd85454de2b> in <module>()
----> 1 river.index('r')
ValueError: substring not found
river.find('r')
-1
river.startswith('M')
True
river.endswith('i')
True
'sip' in river
True
Content Type
'abc123'.isalpha()
False
'abc123'.isalnum()
True
'lowercase'.islower()
True
'lowercase'.isupper()
False
'The Good Ship'.istitle()
True
'The bad seed'.istitle()
False
More information: String Methods
4.2 Format strings
F-strings where introduced in Python 3.6. They prefixed by either a ‘F’ or ‘f’ before the beginning quotation mark. Values can be inserted into F-strings at runtime using replacement fields which are deliminated by curly braces.
Insert variable into replacement field
strings_count = 5
frets_count = 24
f"Noam Pikelny's banjo has {strings_count} strings and {frets_count} frets"
"Noam Pikelny's banjo has 5 strings and 24 frets"
Insert expression into replacement field
a = 12
b = 32
f"{a} times {b} equals {a*b}"
'12 times 32 equals 384'
Index list in string replacement fields
players = ["Tony Trischka", "Bill Evans", "Alan Munde"]
f"Performances will be held by {players[1]}, {players[0]}, and {players[2]}"
'Performances will be held by Bill Evans, Tony Trischka, and Alan Munde'
Conversion flags
A conversion flag can be specified to convert the type of the value before formatting. The three available flags are ‘s’, ‘r’ and ‘a’.
Using str conversion
nuts = [1,2,3,4,5]
f"Calling str() on a the list {nuts} produces {nuts!s}"
'Calling str() on a the list [1, 2, 3, 4, 5] produces [1, 2, 3, 4, 5]'
Using repr conversiont
nut = 'pistacio'
f"Calling repr on the string {nut} results in {nut!r}"
"Calling repr on the string pistacio results in 'pistacio'"
Using ascii conversion
check = "√"
f"The ascii version of {check} is {check!a}"
"The ascii version of √ is '\\u221a'"
Padding a number
lucky_num = 13
f"To pad the number {lucky_num} to 5 places:{lucky_num:5d}"
'To pad the number 13 to 5 places: 13'
Setting padding value at runtime
luckey_num = 13
padding = 5
f"To pad the number {lucky_num} to {padding} places:{lucky_num:{padding}d}"
'To pad the number 13 to 5 places: 13'
More information: Format String Syntax
Other String Formatting: String Format Method
4.3 Manipulate strings
Concatenation
"Bob" + "beroo"
'Bobberoo'
"AB" * 8
'ABABABABABABABAB'
Remove Whitespace
ship = " The Yankee Clipper "
ship
' The Yankee Clipper '
ship.strip()
'The Yankee Clipper'
ship.lstrip()
'The Yankee Clipper '
ship.rstrip()
' The Yankee Clipper'
ship.rstrip("per ")
' The Yankee Cli'
Add padding
port = "Boston"
port.center(12, '*')
'***Boston***'
port.ljust(12, '*')
'Boston******'
port.rjust(12, '*')
'******Boston'
for port_city in ['Liverpool', 'Boston', 'New York', 'Philadelphia']:
print(port_city.rjust(12))
Liverpool
Boston
New York
Philadelphia
'-5'.zfill(4)
'-005'
Replace
"FILADELFIA".replace("F", "PH")
'PHILADELPHIA'
Spitting and Joining
words_string = "Here,Are,Some,Words"
words_string
'Here,Are,Some,Words'
Split on comma
words = words_string.split(',')
words
['Here', 'Are', 'Some', 'Words']
Joining
':'.join(words)
'Here:Are:Some:Words'
Split on newline
multiline = "Sometimes we are given\na multiline document\nas a single string"
multiline
'Sometimes we are given\na multiline document\nas a single string'
for line in multiline.splitlines():
print(line)
Sometimes we are given
a multiline document
as a single string
Slicing
collector = "William Main Doerflinger"
collector[0]
'W'
collector[-1]
'r'
collector[13:18]
'Doerf'
collector[-7:]
'flinger'
More information: common sequence operations
4.4 Learn to use unicode
There are multiple encoding possible for mapping characters to bytes. Python strings default to UTF-8. Earlier versions of Python used a more limited encoding.
Encode
twice_pie = 'ππ'
twice_pie
'ππ'
twice_π = twice_pie
twice_π
'ππ'
pie = "\N{GREEK CAPITAL LETTER PI}"
pie
'Π'
ord(pie)
928
chr(928)
'Π'
u = chr(40960) + 'abcd' + chr(1972)
u.encode('utf-8')
u
'ꀀabcd\u07b4'
Saving File in Unicode
with open("new_file.txt", "w", encoding='utf-8') as opened_file:
opened_file.write("Søme Unˆcode text")
!cat new_file.txt
Søme Unˆcode text