Exercises: OESMN (Obtaining, Scrubbing, Exploring, Modeling, iNterpreting)
As part of the Data science is OSEMN module for Obtaining Data I worked through the exercises.
Example Code
Sample Data
Example Code
# Exercises
"""http://people.duke.edu/~ccc14/sta-663/DataProcessingSolutions.html#exercises"""
"""1. Write the following sentences to a file “hello.txt” using open and write.
There should be 3 lines in the resulting file.
Hello, world.
Goodbye, cruel world.
The world is your oyster."""
str = 'Data\Test.txt'
f = open(str, 'w')
f.write('Hello, world.\r')
f.write('Goodbye, cruel world.\r')
f.write('The world is your oyster.')
# Writes same thing, only in one statement
f.write('\rHello, world.\rGoodbye, cruel world.\rThe world is your oyster.')
f.close()
with open(str, 'r') as f:
content = f.read()
print(content)
"""2. Using a for loop and open, print only the lines from the
file ‘hello.txt’ that begin wtih ‘Hello’ or ‘The’."""
for line in open(str, 'r'):
if line.startswith('Hello') or line.startswith('The') :
print(line)
"""3. Most of the time, tabular files can be read corectly using
convenience functions from pandas. Sometimes, however, line-by-line processing
of a file is unavoidable, typically when the file originated from
an Excel spreadsheet. Use the csv module and a for loop to create
a pandas DataFrame for the file ugh.csv."""
# Reading a csv line by line
import pandas as pd
with open('Data\OECD - Quality of Life.csv') as f:
rowCount = 0
for line in csv.reader(row for row in f):
if rowCount == 0:
tempDf = pd.DataFrame(index=None, columns=line)
rowCount = 1;
else:
for i in range(len(line)):
tempDf.set_value(newDf.columns[i], rowCount, line[i])
rowCount += 1
tempDf
# the easy way
otherDf = pd.read_csv('Data\OECD - Quality of Life.csv')
Sample Data
Comments
Post a Comment