{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Basic Programming Using Python: Files and Lists" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Learn how to open a file and read in data.\n", "- Understand how to interpret and remove newlines in python.\n", "- Use `ipythonblocks` library to create grids of colored cells based on characters in a file.\n", "- Introduce the list data structure and learn how to manipulate lists in python.\n", "- Use lists to store lines from a data file in order to generate multi-dimensional color grids.\n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "File I/O: Reading in Data From Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our [previous lesson](python-3-conditionals-defensive.ipynb),\n", "we learned how to set the colors in a grid based on the characters in a string.\n", "However, what happens if we want to set the colors based on the characters contained in a file? Here we will learn out to open files and read in lines of data for use in setting the colors of a grid. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a refresher, here's our coloring function that takes in an an ImageGrid object, grid, and colors it based on the characters in the string, data:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def color_from_string(grid, data):\n", " \"Color grid cells red and green according to 'R' and 'G' in data.\"\n", " assert grid.width == len(data), \\\n", " 'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))\n", " for x in range(grid.width):\n", " assert data[x] in 'GR', \\\n", " 'Unknown character in data string: \"{0}\"'.format(data[x])\n", " if data[x] == 'R':\n", " grid[x, 0] = colors['Red']\n", " else:\n", " grid[x, 0] = colors['Green']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's how we use it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from ipythonblocks import ImageGrid, colors\n", "\n", "row = ImageGrid(5, 1)\n", "color_from_string(row, 'RRGRR')\n", "row.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a conventional text editor, we can create a text file that contains just that string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_rrgrr.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRGRR\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's read it into our program:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "reader = open('grid_rrgrr.txt', 'r')\n", "line = reader.readline()\n", "reader.close()\n", "print 'line is:', line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "line is: RRGRR\n", "\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **first line** of our program uses a built-in function called `open` to open our file.\n", "`open`'s first parameter specifies the file we want to open;\n", "the second parameter,\n", "`'r'`,\n", "signals that we want to read the file.\n", "(We can use `'w'` to write files,\n", "which we'll explore later.)\n", "`open` returns a special object that keeps track of which file we opened,\n", "and how much of its data we've read.\n", "This object is sometimes called a [file handle](glossary.html#file_handle),\n", "and we can assign it to a variable like any other value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **second line** of our program asks the file handle's `readline` method\n", "to read the first line from the file\n", "and give it back to us as a string." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **third line** of the program asks the file handle to close itself\n", "(i.e., to disconnect from the file). \n", "\n", "\n", "*Important note on closing files:* When we open a file,\n", "the operating system creates a connection between our program and that file.\n", "For performance and security reasons,\n", "it will only let a single program have a fixed number of files open at any one time,\n", "and will only allow a single file to be opened by a fixed number of programs at once.\n", "Both limits are typically up in the thousands,\n", "and the operating system automatically closes open files\n", "when a program finishes running,\n", "so we're unlikely to run into problems most of the time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But that's precisely what makes this problematic.\n", "Something that only goes wrong when we're doing something large\n", "is much harder to debug than something that also goes wrong in the small.\n", "It's therefore a very good idea to get into the habit of *closing files\n", "as soon as they're no longer needed*.\n", "In fact,\n", "it's such a good idea that Python and other languages\n", "have a way to guarantee that it happens automatically." ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_rrgrr.txt', 'r') as reader:\n", " open('grid_rrgrr.txt', 'r')\n", " line = reader.readline()\n", " print 'line is:', line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "line is: RRGRR\n", "\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `with...as...` statement takes whatever is created by its first part—in\n", "our case,\n", "the result of opening a file—and\n", "assigns it to the variable given in its second part.\n", "It then executes a block of code,\n", "and when that block is finished,\n", "it cleans up the stored value.\n", "\"Cleaning up\" a file means closing it;\n", "it means different things for databases and connections to hardware devices,\n", "but in every case,\n", "Python guarantees to do the right thing at the right time.\n", "We'll use `with` statements for file I/O from now on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally on the **fourth line** we print the string we read.\n", "The result is `'RRGRR'`,\n", "just as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or is it?\n", "Let's take a look at `line`'s length:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print len(line)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "6\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why does `len` tell us there are six characters instead of five?\n", "We can use another function called `repr` to take a closer look\n", "at what we actually read:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print repr(line)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "'RRGRR\\n'\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`repr` stands for \"representation\".\n", "It returns whatever we'd have to type into a Python program\n", "to create the thing we've given it as a parameter.\n", "In this case,\n", "it's telling us that our string contains 'R', 'R', 'G', 'R', 'R', and '\\n'.\n", "That last thing is called an [escape sequence](glossary.html#escape_sequence),\n", "and it's how Python represent a [newline character](glossary.html#newline_character)\n", "in a string.\n", "We can use other escape sequences to represent other special characters:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'We\\'ll put a single quote in a single-quoted string.'\n", "print \"Or we\\\"ll put a double quote in a double-quoted string.\"\n", "print 'This\\nstring\\ncontains\\nnewlines.'\n", "print 'And\\tthis\\tone\\tcontains\\ttabs.'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "We'll put a single quote in a single-quoted string.\n", "Or we\"ll put a double quote in a double-quoted string.\n", "This\n", "string\n", "contains\n", "newlines.\n", "And\tthis\tone\tcontains\ttabs.\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *Carriage Return, Newline, and All That*\n", "\n", "\n", "If we create our file on Windows,\n", "it might contain 'RRGRR\\r\\n' instead of 'RRGRR\\n'.\n", "The '\\r' is a [carriage return](glossary.html#carriage_return),\n", "and it's there because Windows uses two characters to mark the ends of lines\n", "rather than just one.\n", "There's no reason to prefer one convention over the other,\n", "but problems do arise when we create files one way\n", "and try to read them with programs that expect the other.\n", "Python does its best to shield us from this\n", "by converting Windows-style '\\r\\n' end-of-line markers to '\\n'\n", "as it reads data from files.\n", "If we really want to keep the original line endings,\n", "we need to use `'rb'` (for \"read binary\") when we open the file\n", "instead of just `'r'`.\n", "For more on this and other madness,\n", "see Joel Spolsky's article\n", "[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The easiest way to get rid of our annoying newline character\n", "is to use `str.strip`,\n", "i.e.,\n", "the `strip` method of the string data type.\n", "As its interactive help says:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "help(str.strip)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Help on method_descriptor:\n", "\n", "strip(...)\n", " S.strip([chars]) -> string or unicode\n", " \n", " Return a copy of the string S with leading and trailing\n", " whitespace removed.\n", " If chars is given and not None, remove characters in chars instead.\n", " If chars is unicode, S will be converted to unicode before stripping\n", "\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`str.strip` creates a new string by removing any leading or trailing [whitespace](glossary.html#whitespace) characters\n", "from the original (`str.lstrip` and `str.rstrip` remove only leading or trailing whitespace, respectively).\n", "Whitespace includes carriage return,\n", "newline,\n", "tab,\n", "and the familiar space character,\n", "so stripping the string also takes care of any accidental indentation\n", "or (invisible) trailing spaces:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "original = ' indented with trailing spaces '\n", "stripped = original.strip()\n", "print '|{0}|'.format(stripped)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "|indented with trailing spaces|\n" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use this to fix our string and initialize our grid.\n", "In fact,\n", "let's write a function that takes a grid and a filename as parameters\n", "and fills the grid using the color specification in that file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def color_from_file(grid, filename):\n", " 'Color the cells in a grid using a spec stored in a file.'\n", " with open(filename, 'r') as reader:\n", " line = reader.readline()\n", " reader.close()\n", " line = line.strip()\n", " color_from_string(grid, line)\n", "\n", "another_row = ImageGrid(5, 1)\n", "color_from_file(another_row, 'grid_rrgrr.txt')\n", "another_row.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's progress,\n", "but we can do better.\n", "When we were creating grids *and* color strings in the same program,\n", "it was fairly easy to make sure the grid and the string were the same size.\n", "Opening a text file in an editor and\n", "counting the characters on the first line\n", "will be a lot more painful,\n", "so why don't we create the grid\n", "based on how long the string is?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def create_from_file(filename):\n", " 'Create and color a grid using a spec stored in a file.'\n", " with open(filename, 'r') as reader:\n", " line = reader.readline()\n", " line = line.strip()\n", " grid = ImageGrid(len(line), 1)\n", " color_from_string(grid, line)\n", " return grid\n", "\n", "newly_made = create_from_file('grid_rrgrr.txt')\n", "newly_made.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is starting to look like our friend `skimage.novice.open`:\n", "given a filename,\n", "it loads the data from that file into a suitable object in memory\n", "and gives the object back to us for further use.\n", "What's more,\n", "it does that using a function that initializes objects which are already in memory,\n", "so that we can fill things several times in exactly the same way\n", "without any duplicated code." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Using Lists to Generate Multi-dimensional Color Grids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A single row of pixels is a lot less interesting than an actual image,\n", "but before we can read the latter,\n", "we need to learn how to use [lists](glossary.html#list).\n", "Just as a `for` loop is a way to do operations many times,\n", "a list is a way to store many values in one variable.\n", "To start our exploration of lists,\n", "try this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "odds = [1, 3, 5]\n", "for number in odds:\n", " print number" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n", "3\n", "5\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`[1, 3, 5]` is a list.\n", "Its elements are written in square brackets and separated by commas,\n", "and just as a `for` loop over a string works on those characters one at a time,\n", "a `for` loop over a list processes the list's values one by one." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do something a bit more useful with a list of numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3, 3, 4, 3, 4, 1]\n", "total = 0.0\n", "for n in data:\n", " total += n\n", "mean = total / len(data)\n", "print 'mean is', mean" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "mean is 2.77777777778\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "By now,\n", "the logic here should be fairly easy to follow.\n", "`data` refers to our list,\n", "and `total` is initialized to 0.0.\n", "Each iteration of the loop adds the next number from the list to `total`,\n", "and when we're done,\n", "we divide the result by the list's length to get the mean.\n", "(Note that we initialize `total` to 0.0 rather than 0,\n", "so that it is always a floating-point number.\n", "If we didn't do this,\n", "its final value might be an integer,\n", "and the division could give us a truncated approximation to the actual mean.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *A Simpler Way*\n", "\n", "\n", "Python actually has a build-in function called `sum` that does what our loop does,\n", "so we can calculate the mean more simply using this:\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'mean is', float(sum(data)) / len(data)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " mean is 2.77777777778\n" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Again,\n", "it's important to understand that `float( sum(data)/len(data) )` might not return the right answer,\n", "since it would do integer/integer division (producing a possibly-truncated result)\n", "and then convert that value to a float.\n", "\n", "
" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "\n", "A Deeper Look at Lists in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are probably used more than any other data structure in programming,\n", "so let's have a closer look at them. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**First**, lists are ordered (i.e. they are sequences) and you can fetch a component object out of a list by indexing the list starting at index 0:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "print values[0]\n", "print values[1]\n", "print values[2]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n", "3\n", "5\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Second**, lists are [mutable](glossary.html#mutable),\n", "i.e.,\n", "they can be changed after they are created:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "values[0] = 'one'\n", "values[1] = 'three'\n", "values[2] = 'five'\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['one', 'three', 'five']\n" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As the diagrams below show,\n", "this works because the list doesn't actually contain any values.\n", "Instead,\n", "it stores [references](glossary.html#reference) to values.\n", "When we assign something to `values[0]`,\n", "what we're really doing is putting a different reference in that location in the list. *Let's quickly go through the block of code above line by line:*" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 3, 5]\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Python refs 1](files/list_refs_1.png)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "values[0] = 'one'\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['one', 3, 5]\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Python list references](files/list_refs_2.png)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "values[0] = 'one'\n", "values[1] = 'three'\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['one', 'three', 5]\n" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Python list references](files/list_refs_3.png)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "values = [1, 3, 5]\n", "values[0] = 'one'\n", "values[1] = 'three'\n", "values[2] = 'five'\n", "print values" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['one', 'three', 'five']\n" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Python list references](files/list_refs_4.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Third**, lists are variable-length and can dynamically grow and shrink in place using built in functions such as `append()` and `remove()`. For example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3]\n", "result = []\n", "print 'The length of result before: ', len(result)\n", "current = 0\n", "for n in data:\n", " current = current + n\n", " result.append(current)\n", "print 'running total:', result\n", "print 'The length of result after: ', len(result)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "The length of result before: 0\n", "running total: [1, 5, 7, 10]\n", "The length of result after: 4\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`result` starts off as an [empty list](glossary.html#empty-list) with a length of 0,\n", "and `current` starts off as zero.\n", "Each iteration of the loop\n", "adds the next value in the list `data` to `current` to calculate the running total.\n", "It then appends this value to `result`,\n", "so that when the program finishes we have a complete list of partial sums." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we want to double the values in `data` in place?\n", "We could try this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3] # re-initialize our sample data\n", "for n in data:\n", " n = 2 * n\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [1, 4, 2, 3]\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "but as we can see,\n", "it doesn't work.\n", "When Python calculates `2*n`\n", "it creates a new value in memory.\n", "It then makes the variable `n` point at the value for a few microseconds\n", "before going around the loop again\n", "and pointing `n` at the next value from the list instead.\n", "Since nothing is pointing to the temporary value we just created any longer,\n", "Python throws it away." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The right way to solve this problem is to use indexing and the `range` function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3] # re-initialize our sample data\n", "for i in range(4):\n", " data[i] = 2 * data[i]\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [2, 8, 4, 6]\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again we have violated the DRY Principle by using `range(4)`:\n", "if we ever change the number of values in `data`,\n", "our loop will either fail because we're trying to index beyond its end,\n", "or what's worse,\n", "appear to succeed but not actually update some values.\n", "Let's fix that:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3] # re-initialize our sample data\n", "for i in range(len(data)):\n", " data[i] *= 2\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [2, 8, 4, 6]\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's better:\n", "`len(data)` is always the actual length of the list,\n", "so `range(len(data))` is always the indices we need.\n", "We've also rewritten the multiplication and assignment to use an in-place operator `*=`\n", "so that we aren't repeating `data[i]`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can actually do this even more efficiently using *list comprehensions*. This isn't exactly the same as the `for loop` solution above because it creates a *new* object, however it is close enough for most applications:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = [1, 4, 2, 3] # re-initialize our sample data\n", "data = [n*2 for n in data]\n", "print 'doubled data:', data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "doubled data: [2, 8, 4, 6]\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also do a lot of other interesting things with lists,\n", "like *concatenate* them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "left = [1, 2, 3]\n", "right = [4, 5, 6]\n", "combined = left + right\n", "print combined" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 2, 3, 4, 5, 6]\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "*count* how many times a particular value appears in them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = ['a', 'c', 'g', 'g', 'c', 't', 'a', 'c', 'g', 'g']\n", "print data.count('g')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "4\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "*sort* them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.sort()\n", "print data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['a', 'a', 'c', 'c', 'c', 'g', 'g', 'g', 'g', 't']\n" ] } ], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and *reverse* them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data.reverse()\n", "print data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['t', 'g', 'g', 'g', 'g', 'c', 'c', 'c', 'a', 'a']\n" ] } ], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### *A Health Warning*\n", "\n", "\n", "One thing that newcomers (and even experienced programmers) often trip over is that\n", "`sort` and `reverse` mutate the list,\n", "i.e.,\n", "they rearrange values within a single list\n", "rather than creating and returning a new list.\n", "If we do this:\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "sorted_data = data.sort()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 37 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "then all we have is the special value `None`,\n", "which Python uses to mean \"there's nothing here\":\n", "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print sorted_data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "None\n" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "At some point or another,\n", "everyone types `data = data.sort()` and then wonders where their time series has gone…\n", "\n", "
" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Back to Multi-dimensional Color Grids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we know how to create lists,\n", "we're ready to load two-dimensional images from files.\n", "Here's our first test file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_3x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\r\n", "RGR\r\n", "GRR\r\n" ] } ], "prompt_number": 38 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and here's how we read it line by line with Python:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " for line in source:\n", " print line" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\n", "\n", "RGR\n", "\n", "GRR\n", "\n" ] } ], "prompt_number": 39 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whoops: we forgot to strip the newlines off the ends of the lines\n", "as we read them from the file.\n", "Let's fix that:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " for line in source:\n", " print line.strip()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRG\n", "RGR\n", "GRR\n" ] } ], "prompt_number": 40 }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's better.\n", "As this example shows,\n", "a `for` loop over a file reads the lines from the file one by one\n", "and assigns each to the loop variable in turn.\n", "If we want to get all the lines at once,\n", "we can do this instead:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " lines = source.readlines() # with an 's' on the end\n", "print lines" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "['RRG\\n', 'RGR\\n', 'GRR\\n']\n" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "`file.readlines` (with an 's' on the end to distinguish it from `file.readline`)\n", "reads the entire file at once\n", "and returns a list of strings,\n", "one per line.\n", "The length of this list tells us how many rows we need in our grid,\n", "while the length of the first line (minus the newline character)\n", "tells us how many columns we need:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('grid_3x3.txt', 'r') as source:\n", " lines = source.readlines()\n", "height = len(lines)\n", "width = len(lines[0]) - 1\n", "print '{0}x{1} grid'.format(width, height)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3x3 grid\n" ] } ], "prompt_number": 42 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Upon reflection,\n", "that's not actually a very good test case,\n", "since we can't actually tell if we have `height` and `width` the right way around.\n", "Let's use a rectangular data file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_5x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGRR\r\n", "RGRRR\r\n" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and put our code in a function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_size(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " return len(lines[0]) - 1, len(lines)\n", "\n", "width, height = read_size('grid_5x3.txt')\n", "print '{0}x{1} grid'.format(width, height)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "5x3 grid\n" ] } ], "prompt_number": 44 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As this example shows,\n", "a function can return several values at once.\n", "When it does,\n", "those values are matched against the caller's variables from left to right.\n", "This can actually be done anywhere:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "red, green, blue = 255, 0, 128\n", "print 'red={0} green={1} blue={2}'.format(red, green, blue)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "red=255 green=0 blue=128\n" ] } ], "prompt_number": 46 }, { "cell_type": "markdown", "metadata": {}, "source": [ "and gives us an easy way to swap the values of two variables:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "low, high = 25, 10 # whoops\n", "low, high = high, low # exchange their values\n", "print 'low={0} high={1}'.format(low, high)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "low=10 high=25\n" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Back to our function…\n", "Rather than just returning sizes,\n", "it would be more useful for us to create and fill in a grid.\n", "As we're doing this,\n", "though,\n", "we must remember to strip the newlines off the strings we have read from the file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " for y in range(len(lines)):\n", " fill_grid_line(result, y, lines[y].strip())\n", " return result" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the most complicated function we've written so far,\n", "so let's go through it step by step:\n", "\n", "1. Define `read_grid` to take a single parameter.\n", "2. Open the file named by that parameter and assign the file handle to `source`.\n", "3. Read all of the lines from the file at once and assign the resulting list to `lines`.\n", "4. Having closed the file, calculate the width and height of the grid.\n", "5. Create the grid.\n", "6. Loop over the lines.\n", "7. Fill in a single line of the grid using an as-yet-unwritten function called `fill_grid_line`.\n", "8. Once the loop is done, return the resulting grid.\n", "\n", "We need a new function `fill_grid_line`\n", "because the function we've been using,\n", "`color_from_string`,\n", "always colors row 0 of whatever grid it's given.\n", "We need something that can color any row we specify:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def fill_grid_line(grid, y, data):\n", " \"Color grid cells in row y red and green according to 'R' and 'G' in data.\"\n", " assert 0 <= y < grid.height, \\\n", " 'Row index {0} not within grid height {1}'.format(y, grid.height)\n", " assert grid.width == len(data), \\\n", " 'Grid and string lengths do not match: {0} != {1}'.format(grid.width, len(data))\n", " for x in range(grid.width):\n", " assert data[x] in 'GR', \\\n", " 'Unknown character in data string: \"{0}\"'.format(data[x])\n", " if data[x] == 'R':\n", " grid[x, y] = colors['Red']\n", " else:\n", " grid[x, y] = colors['Green']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As well as adding an extra parameter `y` to this function,\n", "we've added an extra assertion to make sure it's between 0 and the grid's height.\n", "In fact,\n", "we could have said,\n", "\"*Since* we're adding an extra parameter,\n", "we've added an extra assertion,\"\n", "since it's good practice to check every input to a function before using it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's give our functions a try:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "rectangle = read_grid('grid_5x3.txt')\n", "rectangle.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perfect—or is it?\n", "Take another look at our data file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_5x3.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGRR\r\n", "RGRRR\r\n" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 'G' in the top row of the data file is on the right,\n", "but the green square in the top row of the data file is on the left.\n", "The green cell in the bottom row of the grid\n", "is also in the wrong place.\n", "Somehow,\n", "our grid appears to be upside down." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The problem is that we haven't used a consistent coordinate system.\n", "`ImageGrid` uses a Cartesian grid with the origin in the lower left and Y going upward,\n", "but we're treating the file as if the origin was at the top,\n", "just as it is in a spreadsheet.\n", "The simplest way to fix this is to reverse our list of lines before using it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " lines.reverse() # align with ImageGrid coordinate system\n", " for y in range(len(lines)):\n", " fill_grid_line(result, y, lines[y].strip())\n", " return result\n", "\n", "rectangle = read_grid('grid_5x3.txt')\n", "rectangle.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "All that's left is to make sure that all the lines are the same length\n", "so that we're warned of an error if we try to use a file like this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat grid_ragged.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RRRGR\r\n", "RRGR\r\n", "RGR\r\n" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we require all the lines to be the same length,\n", "we can compare their lengths against the length of any one line.\n", "We can do this in a loop of its own:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "for line in lines:\n", " assert len(line) == width\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or put the test in the loop that's filling the lines:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```python\n", "for y in range(len(lines)):\n", " assert len(lines[y].strip()) == width\n", " fill_grid_line(result, y, lines[y].strip())\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first does the checks before it makes any changes to the grid.\n", "Since we're creating the grid inside the function,\n", "though,\n", "this isn't a real worry:\n", "if there's an error in the file,\n", "our assertion will cause the function to fail\n", "and the partially-initialized grid will never be returned to the caller.\n", "We will therefore use the second form,\n", "but modify it slightly so that we only call `strip` once (DRY again):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def read_grid(filename):\n", " \"Initialize a grid by reading lines of 'R' and 'G' from a file.\"\n", " with open(filename, 'r') as source:\n", " lines = source.readlines()\n", " width, height = len(lines[0]) - 1, len(lines)\n", " result = ImageGrid(width, height)\n", " lines.reverse()\n", " for y in range(len(lines)):\n", " string = lines[y].strip()\n", " assert len(string) == width, \\\n", " 'Line {0} is {1} long, not {2}'.format(y, len(string), width)\n", " fill_grid_line(result, y, string)\n", " return result" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 72 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As always,\n", "we're not done until we test our change:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "read_grid('grid_ragged.txt')" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "Line 0 is 3 long, not 5", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mread_grid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'grid_ragged.txt'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m\u001b[0m in \u001b[0;36mread_grid\u001b[0;34m(filename)\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0my\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlines\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mstring\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlines\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstrip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mwidth\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Line {0} is {1} long, not {2}'\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwidth\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 11\u001b[0m \u001b[0mfill_grid_line\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstring\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 12\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: Line 0 is 3 long, not 5" ] } ], "prompt_number": 74 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And of course we should make sure that it still works for a valid file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "once_more = read_grid('grid_5x3.txt')\n", "once_more.show()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 75 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Thumbnails Revisited" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have all the concepts we need to create thumbnails for a set of images,\n", "and almost all the tools.\n", "The one remaining piece of the puzzle is the unpleasantly-named `glob`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import glob\n", "print 'text files:', glob.glob('*.txt')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "text files: ['grid_3x3.txt', 'grid_5x3.txt', 'grid_ragged.txt', 'grid_rrgrr.txt']\n" ] } ], "prompt_number": 80 }, { "cell_type": "code", "collapsed": false, "input": [ "print 'IPython Notebooks:', glob.glob('*.ipynb')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "IPython Notebooks: ['python-0-resize-image.ipynb', 'python-1-functions.ipynb', 'python-2-loops-indexing.ipynb', 'python-3-conditionals-defensive.ipynb', 'python-4-files-lists.ipynb']\n" ] } ], "prompt_number": 81 }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"glob\" was originally short for \"global command\",\n", "but it has long since become a verb in its own right.\n", "It takes a single string as a parameter\n", "and uses it to do [wildcard](glossary.html#wildcard) matching on filenames,\n", "returning a list of matches as a result.\n", "Once we have this list,\n", "we can loop over it and create thumbnails one by one:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from skimage import novice\n", "from glob import glob\n", "\n", "DEFAULT_WIDTH = 100\n", "\n", "def make_all_thumbnails(pattern, width=DEFAULT_WIDTH):\n", " \"Create thumbnails for all image files matching the given pattern.\"\n", " for filename in glob(pattern):\n", " make_thumbnail(filename, width)\n", "\n", "def make_thumbnail(original_filename, width=DEFAULT_WIDTH):\n", " \"Create a thumbnail for a single image file.\"\n", " picture = novice.open(original_filename)\n", " new_height = int(picture.height * float(width) / picture.width)\n", " picture.size = (width, new_height)\n", " thumbnail_filename = 'thumbnail-' + original_filename\n", " picture.save(thumbnail_filename)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 83 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only thing that's really new here is the way we specify the default value for thumbnail widths.\n", "Since people might call both `make_all_thumbnails` and `make_thumbnail` directly,\n", "we want to be able to set the width for either.\n", "However,\n", "we also want their default values to be the same,\n", "so we define that value once near the top of the program\n", "and use it in both function definitions.\n", "By convention,\n", "\"constant\" values like `DEFAULT_WIDTH` are spelled in UPPER CASE\n", "to indicate that they shouldn't be changed." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Key Points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Open a file for reading using `file_handle` = `open(filename, 'r')` and close the file using `file_handle.close()`.\n", "- Opening a file using `open(...)` returns a file handle object that makes a connection between a program and the file.\n", "- All lines read in from a file contain a newline character (`\\n`) at the end of the line.\n", "- Remove newline characters (and all leading or trailing whitepace) in python using `line.strip()`.\n", "- Lists are elements are written in square brackets and separated by commas.\n", "- Lists are mutable objects - they can be changed after they are created.\n", "- List data structure in python has built in functions including: `count()`, `sort()`, and `reverse()`.\n" ] } ], "metadata": {} } ] }