Exercises: OESMN (Obtaining, Scrubbing, Exploring, Modeling, iNterpreting)

As part of the Data science is OSEMN module for Obtaining Data I worked through the exercises.

Example Code

 # Exercises  
 """1. Write the following sentences to a file “hello.txt” using open and write.   
 There should be 3 lines in the resulting file.  
 Hello, world.  
 Goodbye, cruel world.  
 The world is your oyster."""  
 str = 'Data\Test.txt'  
 f = open(str, 'w')  
 f.write('Hello, world.\r')  
 f.write('Goodbye, cruel world.\r')  
 f.write('The world is your oyster.')  
 # Writes same thing, only in one statement  
 f.write('\rHello, world.\rGoodbye, cruel world.\rThe world is your oyster.')  
 with open(str, 'r') as f:  
   content = f.read()  
 """2. Using a for loop and open, print only the lines from the   
 file ‘hello.txt’ that begin wtih ‘Hello’ or ‘The’."""  
 for line in open(str, 'r'):  
   if line.startswith('Hello') or line.startswith('The') :  
 """3. Most of the time, tabular files can be read corectly using   
 convenience functions from pandas. Sometimes, however, line-by-line processing   
 of a file is unavoidable, typically when the file originated from   
 an Excel spreadsheet. Use the csv module and a for loop to create   
 a pandas DataFrame for the file ugh.csv."""  
 # Reading a csv line by line  
 import pandas as pd  
 with open('Data\OECD - Quality of Life.csv') as f:  
   rowCount = 0  
   for line in csv.reader(row for row in f):  
     if rowCount == 0:  
       tempDf = pd.DataFrame(index=None, columns=line)  
       rowCount = 1;  
       for i in range(len(line)):  
         tempDf.set_value(newDf.columns[i], rowCount, line[i])  
       rowCount += 1  
 # the easy way  
 otherDf = pd.read_csv('Data\OECD - Quality of Life.csv')  

Sample Data


Popular posts from this blog

Neural Networks in R (Part 2 of 4) - caret and nnet on State 'Personality' and Political Outcomes

Cultural Dimensions and Coffee Consumption

Charting Correlation Matrices in R