Python For Spatial Data
Python For Spatial Data
May 4, 2013
Aberystwyth University
Institute of Geography and Earth Sciences.
Copyright c Pete Bunting and Daniel Clewley 2013. This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons. org/licenses/by-sa/3.0/.
Acknowledgements
The authors would like to acknowledge to the supports of others but specically (and in no particular order) Prof. Richard Lucas, Sam Gillingham (developer of RIOS and the image viewer) and Neil Flood (developer of RIOS) for their support and time.
ii
Authors
Peter Bunting
Dr Pete Bunting joined the Institute of Geography and Earth Sciences (IGES), Aberystwyth University, in September 2004 for his Ph.D. where upon completion in the summer of 2007 he received a lectureship in remote sensing and GIS. Prior to joining the department, Peter received a BEng(Hons) in software engineering from the department of Computer Science at Aberystwyth University. Pete also spent a year working for Landcare Research in New Zealand before rejoining IGES in 2012 as a senior lecturer in remote sensing.
Contact Details
EMail: pfb@aber.ac.uk Senior Lecturer in Remote Sensing Institute of Geography and Earth Sciences Aberystwyth University Aberystwyth Ceredigion SY23 3DB United Kingdom
iii
iv
Daniel Clewley
Dr Dan Clewley joined IGES in 2006 undertaking an MSc in Remote Sensing and GIS, following his MSc Dan undertook a Ph.D. entitled Retrieval of Forest Biomass and Structure from Radar Data using Backscatter Modelling and Inversion under the supervision of Prof. Lucas and Dr. Bunting. Prior to joining the department Dan completed his BSc(Hons) in Physics within Aberystwyth University. Dan is currently a post-doc researcher at the University of Southern California.
Contact Details
Email: clewley@usc.edu Postdoctoral Research Associate Microwave Systems, Sensors, and Imaging Lab (MiXIL) Ming Hsieh Department of Electrical Engineering The University of Southern California Los Angeles USA
Table of Contents
1 Introduction 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.1.3 1.2 What is Python? . . . . . . . . . . . . . . . . . . . . . . . . What can it be used for? . . . . . . . . . . . . . . . . . . . . A word of warning . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 2 2 3 3 4 5 5 5 5 6 6 7 7
Python Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . Text Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 1.5.2 1.5.3 1.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mac OSX . . . . . . . . . . . . . . . . . . . . . . . . . . . . Going between Windows and UNIX . . . . . . . . . . . . . .
1.6
TABLE OF CONTENTS 1.6.4 1.6.5 1.6.6 1.6.7 1.7 Case Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . File paths in examples . . . . . . . . . . . . . . . . . . . . . Independent Development of Scripts . . . . . . . . . . . . . Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . .
vi 8 8 8 9
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 11
Hello World Script . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 2.3.2 2.3.3 2.3.4 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Text (Strings) . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Example using Variables . . . . . . . . . . . . . . . . . . . . 15
2.4
2.5
2.6
2.7 2.8
3 Text Processing
vii
Programming Styles . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.1 3.3.2 Procedural Programming File Outline . . . . . . . . . . . 31 Object Orientated Programming File Outline . . . . . . . 32
3.4
Object Oriented Script . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.4.1 Object Oriented Script for Text File Processing . . . . . . . 33
3.5 3.6
5 Plotting - Matplotlib 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Simple Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Pie Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Scatter Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Line Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Exercise: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
viii 60
6.3
6.4
6.5
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 80
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Merging ESRI Shapeles . . . . . . . . . . . . . . . . . . . . . . . . 81 Convert Images to GeoTIFF using GDAL. . . . . . . . . . . . . . . 90 7.3.1 Passing Inputs from the Command Line into your script . . 91
7.4 7.5
Reading and Updating Header Information . . . . . . . . . . . . . . 94 8.1.1 8.1.2 8.1.3 8.1.4 8.1.5 Reading Image Headers . . . . . . . . . . . . . . . . . . . . 94
Read image header example. . . . . . . . . . . . . . . . . . . 96 No Data Values . . . . . . . . . . . . . . . . . . . . . . . . . 98 Band Name . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 GDAL Meta-Data . . . . . . . . . . . . . . . . . . . . . . . . 103
8.2
TABLE OF CONTENTS 8.2.1 8.2.2 8.2.3 8.2.4 8.2.5 8.3 8.4 8.5 8.6
ix
Getting Help Reminder . . . . . . . . . . . . . . . . . . . . 108 Band Maths . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Multiply by a constant . . . . . . . . . . . . . . . . . . . . . 109 Calculate NDVI . . . . . . . . . . . . . . . . . . . . . . . . . 111 Calculate NDVI Using Multiple Images . . . . . . . . . . . . 114
Filtering Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Apply a rule based classication . . . . . . . . . . . . . . . . . . . . 120 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 125
Reading Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Writing Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.2.1 9.2.2 Calculating New Columns . . . . . . . . . . . . . . . . . . . 127 Add Class Name . . . . . . . . . . . . . . . . . . . . . . . . 129
9.3 9.4
Adding a colour table . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Using RATs for rule based classications. . . . . . . . . . . . . . . . 134 9.4.1 Developing a rule base . . . . . . . . . . . . . . . . . . . . . 134
9.5 9.6
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 10.2 Model Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 10.3 Reading Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 146 10.4 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
TABLE OF CONTENTS
10.5 Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10.6 Creating Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 10.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 A RSGISLib 168
A.1 Introduction to RSGISLib . . . . . . . . . . . . . . . . . . . . . . . 168 A.2 Using RSGISLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 A.2.1 The RSGISLib XML Interface . . . . . . . . . . . . . . . . . 169 A.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 A.3.1 XML Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.4 Populating Segments . . . . . . . . . . . . . . . . . . . . . . . . . . 178
List of Figures
5.1 5.2 5.3 5.4 5.5 5.6 6.1 A simple plot using matplotlib. . . . . . . . . . . . . . . . . . . . . 51 A simple bar chart using matplotlib. . . . . . . . . . . . . . . . . . 52 A simple pie chart using matplotlib. . . . . . . . . . . . . . . . . . . 54 A simple scatter plot using matplotlib. . . . . . . . . . . . . . . . . 55 Rainfall data for summer and winter on the same axis. . . . . . . . 58 Rainfall data for summer and winter on dierent axis. . . . . . . . 59 A simple plot using matplotlib. . . . . . . . . . . . . . . . . . . . . 74
xi
List of Tables
1.1 2.1 2.2 3.1 6.1 Keywords within the Python language . . . . . . . . . . . . . . . . 7
The mathematical functions available within python. . . . . . . . . 14 Logic statements available within python . . . . . . . . . . . . . . . 22 Options when opening a le. . . . . . . . . . . . . . . . . . . . . . . 29 Coecients for estimating volume and the specic gravity required for estimating the biomass by species. . . . . . . . . . . . . . . . . . 65
Chapter 1 Introduction
1.1
1.1.1
Background
What is Python?
Python is a high level scripting language which is interpreted, interactive and object-oriented. A key attribute of python is its clear and understandable syntax which should allow you to quickly get up to speed and develop useful application, while the syntax is similar enough to lower level languages, for example C/C++ and Java, to provide a background from which you can grow your expertise. Python is also a so called memory managed language, meaning that you the developer are not directly in control of the memory usage within your application, making development much simpler. That is not saying that memory usage does not need to be considered and you, the developer, cannot inuence the memory footprint of your scripts but these details are out of the scope of this course. Python is cross-platform with support for Windows, Linux, Mac OS X and most other UNIX platforms. In addition, many libraries (e.g., purpose built and external C++ libraries) are available to python and it has become a very popular language for many applications, including on the internet and within remote sensing and GIS.
CHAPTER 1. INTRODUCTION
1.1.2
Python can be used for almost any task from simple le operations and text manipulation to image processing. It may also be used to extend the functionality of other, larger applications.
1.1.3
A word of warning
There are number of dierent versions of python and these are not always compatible. For these worksheets we will be using version 3.X (at the time of writing the latest version is 3.3.0). With the exception of the quiz in Chapter 2, where raw_input must be used instead of input, the examples will also work python 2.7. One of the most noticeable dierences between python 2 and python 3 is that the print statement is now a function. So whilst:
print "Hello World"
will work under python 2, scripts using it wont run under python 3 and must use:
print("Hello World")
instead. As the second is backwards compatible with python 2 it is good practice to use this, even if you are working with python 2.
1.2
1.2.1
Many applications have been built in python and a quick search of the web will reveal the extent of this range. Commonly, applications solely developed in python are web applications, run from within a web server (e.g., Apache; http://httpd.apache.org with http://www.modpython.org) but Desktop applications and data processing software such as viewer (https://bitbucket.
CHAPTER 1. INTRODUCTION
org/chchrsc/viewer) and RIOS (https://bitbucket.org/chchrsc/rios) have also been developed. In large standalone applications python is often used to facilitate the development of plugins or extensions to application. Examples of python used in this form include ArcMap and SPSS. For a list of applications supporting or written in python refer to the following website http://en.wikipedia.org/wiki/Python_software.
1.3
Python Libraries
Many libraries are available to python. Libraries are collections of functions which can be called from your script(s). Python provides extensive libraries (http://docs.python.org/lib/lib.html) but third parties have also developed additional libraries to provide specic functionality (e.g., plotting). A list of available libraries is available from http://wiki.python.org/moin/UsefulModules and by following the links provides on the page. The following sites provide links to libraries and packages specic to remote sensing and GIS, many of which are open source with freely available software packages and libraries for use with python. http://freegis.org http://opensourcegis.org http://www.osgeo.org
1.4
Installing Python
For this tutorial Python alongside the libraries GDAL (http://www.gdal.org), numpy (http://www.numpy.org), scipy (http://www.scipy.org), RIOS (https: //bitbucket.org/chchrsc/rios) and matplotlib (http://matplotlib.sourceforge. net) are required, build against python3. Python, alongside these packages, can be
CHAPTER 1. INTRODUCTION
installed on almost any platform. For Windows a python package which includes all the libraries other than RIOS required for this worksheet is available, for free, as a simple download from http://www.pythonxy.com. To install this package download the installation le and run selecting a full installation. Currently this package only supports python 2.X, not the new python 3. However, this is unlikely to cause problems for the worksheet. For further details of the installation process please see the project website http: //www.pythonxy.com. PythonXY is also available for Linux (https://code.google.com/p/pythonxy-linux) but all these packages are commonly available for the Linux platform through the distributions package management systems or can be build from source (harder but recommended). For Mac OSX the KyngChaos Wiki http://www.kyngchaos.com/software/frameworks makes various binary packages available for installing GDAL etc. and Enthough Canopy https://www.enthought.com/products/canopy/ includes many of the tools required for this course. As with PythonXY, only python 2.X is currently supported. If you would like to use python 3, packages can be build from source. More details on installing individual packages is available at http://docs.python. org/3/install/.
1.5
Text Editors
To write your Python scripts a text editor is required. A simple text editor such as Microsofts Notepad will do but it is recommended that you use a syntax aware editor that will colour, and in some cases format, your code automatically. There are many text editors available for each operating system and it is up to you to choose one to use, although recommendations have been made below.
CHAPTER 1. INTRODUCTION
1.5.1
Windows
The recommend editor is Spyder which installed within the python(x,y) package. From within Spyder you can directly run your python scripts (using the run button), additionally it will alert you to errors within your scripts before you run them. Alternatively, the notepad++ (http://notepad-plus.sourceforge.net) text editor can also be used. Notepad++ is a free to use open source text editor and can therefore be downloaded and installed onto any Windows PC. If you use this editor it is recommended you change the settings for python to use spaces instead of tabs using the following steps: 1. Go to Setting Preferences 2. Select Language Menu / Tab Settings 3. Under Tab Settings for python tick Replace by space
1.5.2
Linux
Under Linux either the command line editor ne (nice editor), vi or its graphic interface equivalent gvim is recommend but kdeveloper, gedit and many others are also good choices.
1.5.3
Mac OSX
Under Mac OSX either BBEdit, SubEthaEdit or TextMate are recommended, while the freely available TextWrangler is also a good choice. The command line editors ne and vi are also available under OS X.
1.5.4
If you are writing your scripts on Windows and transferring them to a UNIX/Linux machine to be executed (e.g., a High Performance Computing (HPC) environment) then you need to be careful with the line ending (the invisible symbol dening the
CHAPTER 1. INTRODUCTION
end of a line within a le) as these are dierent between the various operating systems. Using notepad++ line ending can be dened as UNIX and this is recommended where scripts are being composed under Windows. Alternatively, if RSGISLib is installed then the command ip can be used to convert the line ending, the example below converts to UNIX line endings.
flip -u InputFile.py
1.6
Starting Python
(Alternatively select python(x,y) Command Prompts Python interpreter from the windows start menu). This opens python in interactive mode. It is possible to perform some basic maths try:
>>> 1 + 1 2
To exit type:
>>>exit()
To perform more complex tasks in python often a large number of commands are required, it is therefore more convenient to create a text le containing the commands, referred to as a script
1.6.1
Indentation
There are several basic rules and syntax which you need to know to develop scripts within python. The rst of which is code layout. To provide the structure of the script Python uses indentation. Indentation can be in the form of tabs or spaces but which ever is used needs to be consistent throughout the script. The most
CHAPTER 1. INTRODUCTION
common and recommend is to use 4 spaces for each indentation. The example given below shows an if-else statement where you can see that after the if part the statement which is executed if the if-statement is true is indented from rest of the script as with the corresponding else part of the statement. You will see this indentation as you go through the examples and it is important that you follow the indentation shown in the examples or your scripts will not execute.
1 2 3 4
if x == 1: x = x + 1 else: x = x - 1
1.6.2
Keywords
As with all scripting and programming languages python has a set of keywords, which have special meanings to the compiler or interpreter when the code is executed. As with all python code, these keywords are case sensitive i.e., else is a keyword but Else is not. A list of pythons keywords is given below: Table 1.1: Keywords within the Python language and as assert break class continue def del elif else exec except nally for from global if import in is lambda not or pass print raise return try while with yield
1.6.3
File Naming
It is important that you use sensible and identiable names for all the les you generate throughout these tutorial worksheets otherwise you will not be able to identify the script at a later date. Additionally, it is highly recommended that you do not included spaces in le names or in the directory path you use to store the les generated during this tutorial.
CHAPTER 1. INTRODUCTION
1.6.4
Case Sensitivity
Something else to remember when using python, is that the language is case sensitivity therefore if a name is in lowercase then it needs to remain in lowercase everywhere it is used. For example:
VariableName is not the same as variablename
1.6.5
In the examples provided (in the text) le paths are given as ./PythonCourse/TutorialX/File.xxx. When writing these scripts out for yourself you will need to update these paths to the location on your machine where the les are located (e.g., /home/pete.bunting or C:\). Please note that it is recommended that you do not have any spaces within your le paths. In the example (answer) scripts provided no le path has been written and you will therefore need to either save input and output les in the same directory as the script or provide the path to the le. Please note that under Windows you need to insert a double slash (i.e., \\) within the le path as a single slash is an escape character (e.g., \n for new line) within strings.
1.6.6
There is a signicant step to be made from working your way through notes and examples, such as those provided in this tutorial, and independently developing your own scripts from scratch. Our recommendation for this, and when undertaking the exercises from this tutorial, is to take it slowly and think through the steps you need to undertake to perform the operation(s) you need. I would commonly rst write the script using comments or on paper breaking the process down into the major steps required. For example, if I were asked to write a script to uncompress a directory of les into another directory I might write the following outline, where I use indentation to indicate where a process is part of the parent:
CHAPTER 1. INTRODUCTION
# Get input directory (containing the compressed files) # Get output directory (where the files, once uncompressed, will be placed). # Retrieve list of all files (to be uncompressed) in the input directory. # Iterator through input files, uncompressing each in turn. # Get single file from list # create command line command for the current file # execute command
1 2 3 4 5 6 7 8 9 10
By writing the process out in this form it makes translating this into python much simpler as you only need to think of how to do small individual elements in python and not how to do the whole process in one step.
1.6.7
Getting Help
Python provides a very useful help system through the command line. To get access to the help run python from the terminal
> python
To exit the help system just press the q key on the keyboard.
CHAPTER 1. INTRODUCTION
10
1.7
Further Reading
An Introduction to Python, G. van Rossum, F.L. Drake, Jr. Network Theory ISBN 0-95-416176-9 (Also available online - http://docs.python.org/3/ tutorial/). Chapters 1 3 Python FAQ http://docs.python.org/faq/general.html Python on Windows http://docs.python.org/faq/windows How to think Like a Computer Scientist: Python Edition http://www. greenteapress.com/thinkpython/
To create your rst python script, create a new text le using your preferred text editor and enter the text below:
1 2 3 4 5 6 7 8 9 10 11
#! /usr/bin/env python ####################################### # A simple Hello World Script # Author: <YOUR NAME> # Emai: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### print(Hello World)
Save your script to le (e.g., helloworld.py) and then run it either using a command prompt (Windows) or Terminal (UNIX), using the following command:
> python helloworld.py Hello World
11
12
To get a command prompt under Windows type cmd from the run dialog box in the start menu (Start run), further hints for using the command prompt are given below. Under OS X, terminal is located in within the Utilities folder in Applications. If you are using Spyder to create your Python scripts you can run by clicking the run button. Hints for using the Windows command line cd allows you to change directory, e.g., cd directory1\directory2 dir allows you to list the contents of a directory, e.g., dir To change drives, type the drive letter followed by a colon, e.g., D: If a le path has spaces, you need to use quote, e.g, to change directory: cd "Directory with spaces in name\another directory\"
2.2
Comments
In the above script there is a heading detailing the script function, author, and version. These lines are preceded by a hash (#), this tells the interpreter they are comments and are not part of the code. Any line starting with a hash is a comment. Comments are used to annotate the code, all examples in this tutorial use comments to describe the code. It is recommended you use comments in your own code.
13
2.3
Variables
The key building blocks within all programming languages are variables. Variables allow data to be stored either temperately for use in a single operation or throughout the whole program (global variables). Within python the variable data type does not need to be specied and will be dened by the rst assignment. Therefore, if the rst assignment to a variable is an integer (i.e., whole number) then that variable will be an integer for the remained of the program. Examples dening variables are provided below:
name = Pete # String age = 25 # Integer height = 6.2 # Float
2.3.1
Numbers
There are three types of numbers within python: Integers are the most basic form of number, contain only whole numbers where calculation are automatically rounded to provide whole number answers. Decimal or oating point numbers provide support for storing all those number which do not form a whole number. Complex provide support for complex numbers and are dened as a + bj where a is the real part and b the imaginary part, e.g., 4.5 + 2.5j or 4.5 2.5j or 4.5 + 2.5j The syntax for dening variables to store these data types is always the same as python resolves the suitable type for the variable. Python allows a mathematical operations to be applied to numbers, listed in Table reftab:maths
2.3.2
Boolean
The boolean data type is the simplest and just stores a true or false value, an example of the syntax is given below:
14
Table 2.1: The mathematical functions available within python. Function Operation x+y x plus y x-y x minus y x*y x multiplied by y x/y x divided by y x ** y x to the power of y int(obj) convert string to int long(obj) convert string to long oat(obj) convert string to oat complex(obj) convert string to complex complex(real, imag) create complex from real and imaginary components abs(num) returns absolute value pow(num1, num2) raises num1 to num2 power round(oat, ndig=0) rounds oat to ndig places
moveForwards = True moveBackwards = False
2.3.3
Text (Strings)
To store text the string data type is used. Although not a base data type like a oat or int a string can be used in the same way. The dierence lies in the functions available to manipulate a string are similar to those of an object. A comprehensive list of functions is available for a string is given in the python documentation http://docs.python.org/lib/string-methods.html. To access these functions the string modules needs to be imported as shown in the example below. Copy this example out and save it as StringExamples.py. When you run this script observe the change in the printed output and using the python documentation to identify what each of the functions lstrip(), rstrip() and strip() do.
1 2 3 4
15
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
stringVariable + \)
stringVariable_lstrip = stringVariable.lstrip() print(lstrip: \ + stringVariable_lstrip + \) stringVariable_rstrip = stringVariable.rstrip() print(rstrip: \ + stringVariable_rstrip + \) stringVariable_strip = stringVariable.strip() print(strip: \ + stringVariable_strip + \)
2.3.4
An example script illustrating the use of variables is provided below. It is recommend you copy this script and execute making sure you understand each line. In addition, try making the following changes to the script: 1. Adding your own questions. 2. Including the persons name within the questions. 3. Remove the negative marking.
1 2 3 4 5 6
#! /usr/bin/env python ####################################### # A simple script illustrating the use of # variables. # Author: <YOUR NAME>
16
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
17
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
18
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
print(Question 7:) answer = input(Adobe Photoshop provides the same \ functionality as eCognition.\n) if answer == y: print(Bad Luck) score = score - 1 else: print(Well done) score = score + 1 print(Question 8:) answer = input(Python can be executed within \ the a java virtual machine.\n) if answer == y: print(Well done) score = score + 1 else: print(Bad Luck) score = score - 1 print(Question 9:) answer = input(Python is a scripting language \ not a programming language.\n) if answer == y: print(Well done) score = score + 1 else: print(Bad Luck) score = score - 1 print(Question 10:) answer = input(Aberystwyth is within Mid Wales.\n) if answer == y: print(Well done) score = score + 1 else: print(Bad Luck) score = score - 1 # Finally print out the users final score.
19
130
2.4
Lists
Each of the data types outlined above only store a single value at anyone time, to store multiple values in a single variable a sequence data type is required. Python oers the List class, which allows any data type to be stored in a sequence and even supports the storage of objects of dierent types within one list. The string data type is a sequence data type and therefore the same operations are available. List are very exible structures and support a number of ways to create, append and remove content from the list, as shown below. Items in the list are numbered consecutively from 0-n, where n is one less than the length of the list. Additional functions are available for List data types (e.g., len(aList), aList.sort(), aList.reverse()) and these are described in http://docs.python.org/lib/typesseq. html and http://docs.python.org/lib/typesseq-mutable.html.
2.4.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
List Examples
#! /usr/bin/env python ####################################### # Example with lists # Author: <YOUR NAME> # Emai: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### # Create List: aList = list() anotherList = [1, 2, 3, 4] emptyList = []
20
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
2.4.2
n-dimensional list
Additionally, n-dimensional lists can be created by inserting lists into a list, a simple example of a 2-d structure is given below. This type of structure can be used to store images (e.g., the example given below would form a grey scale image) and additions list dimensions could be added for additional image bands.
1 2 3 4 5
21
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
2.5
IF-ELSE Statements
As already illustrated in the earlier quiz example the ability to make a decision is key to any software. The basic construct for decision making in most programming and scripting languages are if-else statements. Python uses the following syntax for if-else statements.
22
Logic statements result in a true or false value being returned where if a value of true is returned the contents of the if statement will be executed and remaining parts of the statement will be ignored. If a false value is returned then the if part of the statement will be ignored and the next logic statement will be analysis until either one returns a true value or an else statement is reached.
2.5.1
Logic Statements
Table 2.2 outlines the main logic statements used within python in addition to these statements functions which return a boolean value can also be used to for decision making, although these will be described in later worksheets. Table 2.2: Logic statements available within python Function Operation Example == equals expr1 == expr2 > greater than expr1 > expr2 < less than expr1 < expr2 >= greater than and equal to expr1 expr2 <= less than and equal to expr1 expr2 not logical not not expr and logical and expr1 and expr2 or logical or expr1 or expr2 is is the same object expr1 is expr2
23
2.6
Looping
In addition to the if-else statements for decision making loops provide another key component to writing any program or script. Python oers two forms of loops, while and for. Each can be used interchangeably given the developers preference and available information. Both types are outlined below.
2.6.1
while Loop
The basic syntax of the while loop is very simple (shown below) where a logic statement is used to terminate the loop, when false is returned.
while <logic statement> : statements
Therefore, during the loop a variable in the logic statement needs to be altered allowing the loop to terminate. Below provides an example of a while loop to count from 0 to 10.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#! /usr/bin/env python ####################################### # A simple example of a while loop # Author: <YOUR NAME> # Emai: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### count = 0 while count <= 10: print(count) count = count + 1
24
2.6.2
for Loop
A for loop provides similar functionality to that of a while loop but it provides the counter for termination. The syntax of the for loop is provided below:
1 2
The common application of a for loop is for the iteration of a list and an example if this is given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#! /usr/bin/env python ####################################### # A simple example of a for loop # Author: <YOUR NAME> # Emai: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### aList = [Pete, Richard, Johanna, Philippa, Sam, Dan, Alex] for name in aList: print(Current name is: + name)
A more advance example is given below where two for loops are used to iterate through a list of lists.
1 2 3 4 5 6 7 8 9 10 11
#! /usr/bin/env python ####################################### # Example with for loop and n-lists # Author: <YOUR NAME> # Emai: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### # Create List:
25
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
2.7
Exercises
During this tutorial you should have followed through each of the examples and experimented with the code to understand each of components outlined. To test your understanding of all the material, you will now be asked to complete a series of tasks: 1. Update the quiz so the questions and answers are stored in lists which are iterated through as the script is executed. 2. Create a script that loops through the smiling face 2-d list of lists ipping it so the face is up side down.
26
2.8
Further Reading
An Introduction to Python, G. van Rossum, F.L. Drake, Jr. Network Theory ISBN 0-95-416176-9 (Also available online - http://docs.python.org/3/ tutorial/) - Chapters 4 and 5. Spyder Documentation http://packages.python.org/spyder/ Python Documentation http://www.python.org/doc/ Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7 How to think Like a Computer Scientist: Python Edition http://www. greenteapress.com/thinkpython/ Learn UNIX in 10 minutes http://freeengineer.org/learnUNIXin10minutes. html (Optional, but recommended if running on OS X / Linux)
An example of a script to read a text le is given below, copy this example out and use the numbers.txt le to test your script. Note, that the numbers.txt le needs to be within the same directory as your python script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#! /usr/bin/env python ####################################### # A simple example reading in a text file # two versions of the script are provided # to illustrate that there is not just one # correct solution to a problem. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### import string # 1) Splits the text file into individual characters # to identify the commas and parsing the individual # tokens.
27
28
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
As you can see reading a text le from within python is a simple process. The rst step is to open the le for reading, option r is used as the le is only going to be read, the other options are available in Table reftab:leopenning. If the le is a text le then the contents can then be read a line at a time, if a binary le (e.g., ti or doc) then reading is more complicated and not covered in this tutorial.
29
Table 3.1: Options when opening a le. File Mode Operations r Open for read w Open for write (truncate) a Open for write (append) r+ Open for read/write w+ Open for read/write (truncate) a+ Open for read/write (append) rb Open for binary read wb Open for binary write (truncate) ab Open for binary write (append) rb+ Open for read/write wb+ Open for read/write (truncate) ab+ Open for read/write (append) Now your need to adapt the one of the methods given in the script above to allow numbers and words to be split into separate lists. To do this you will need to use the isalpha() function alongside the isdigit() function. Adapt the numbers.txt le to match the input shown below and then run your script and you should receive the output shown below:
Input:
1, 2,pete, 3, 4,dan,5, 6,7,8,richard,10,11,12,13
Output:
>python simplereadsplit.py [1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13] [pete, dan, richard]
30
3.2
Writing to a text le is similar to reading from the le. When opening the le two choices are available either to append or truncate the le. Appending to the le leaves any content already within the le untouched while truncating the le removes any content already within the le. An example of writing a list to a le with each list item on a new line is given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
#! /usr/bin/env python ####################################### # A simple script parsing numbers of # words from a comma seperated text file # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### aList = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, one, two, three, four, five, six, seven, eight, nine, ten] dataFile = open(writetest.txt, w) for eachitem in aList: dataFile.write(str(eachitem)+\n) dataFile.close()
3.3
Programming Styles
There are two main programming styles, both of which are supported by python, and these are procedural and object oriented programming. Procedural programming preceded object oriented programming and procedural scripts provide lists of commands which are run through sequentially.
31
Object oriented programming diers from procedural programming in that the program is split into a series of objects, usually representing really world objects or functionality, generally referred to as a class. Objects support the concepts of inheritance where functionality can be used in many sub-objects. For example, a Person class maybe written with functions such as eat, drink, beat heart etc. and specialist sub-objects may then be created with Person as a super-object, for example child, adult, male and female. These objects all require the functionality of Person but it is inecient to duplicate the functionality they share individual rather then group this functionality into the Person class. This course will concentrate on basic object oriented programming but below are the basic python le outlines for both procedural and object oriented scripts.
3.3.1
When creating a procedural python script each of your les will have the same basic format outlined below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#! /usr/bin/env python ####################################### # Comment explaining scripts purpose # Author: <Author Name> # Email: <Authors Email> # Date: <Date Last Editor> # Version: <Version Number> ####################################### # IMPORTS # e.g., import os # SCRIPT print("Hello World") # End of File
32
3.3.2
When creating an object oriented script each python le you create will have the same basic format outlined below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
#! /usr/bin/env python ####################################### # Comment explaining scripts purpose # Author: <Author Name> # Emai: <Authors Email> # Date: <Date Last Editor> # Version: <Version Number> ####################################### # IMPORTS import os # CLASS EXPRESSION - In this case class name is Person class Person (object): # Object is the superclass # CLASS ATTRIBUTES name = # INITIALISE THE CLASS (OFTEN EMPTY) def __init__(self): self.name = Dan # METHOD TO PRINT PERSON NAME def printName(self): print(Name: + self.name) # METHOD TO SET PERSON NAME def setName(self, inputName): self.name = inputName # METHOD TO GET PERSON NAME def getName(self): return self.name
33
36 37 38 39 40 41 42 43 44 45 46 47
3.4
For simple scripts like those demonstrated so far simple procedural scripts are all that have been required. When creating more complex scripts the introduction of more structured and reusable designs are preferable. To support this design Python supports object oriented program design.
3.4.1
To illustrate the dierence in implementation an example is given and explained below. The example reads a comma separated text le (randoats.txt) of random oating point numbers from which the mean and standard deviation is calculated. Create a new python script and copy the script below:
1 2 3 4 5 6 7 8
#! /usr/bin/env python ####################################### # An python class to parse a comma # separates text file to calculate # the mean and standard deviation # of the inputted floating point # numbers.
34
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
35
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
36
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
NOTE: __name__ and __main__ each have TWO underscores either side (i.e., ).
Although, an object oriented design has been introduced making the above code, potentially, more reusable the design does not separate more general functionality from the application. To do this the code will be split into two les the rst, named MyMaths.py, will contain the mathematical operations calcMean and calcStdDev while the second, named FileSummary, contains the functions run, which controls the ow of the script, and parseCommaFile(). The code for these les is given below but rst try and split the code into the two les yourself.
1 2
#! /usr/bin/env python
37
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1 2 3 4 5 6 7 8 9 10
#! /usr/bin/env python ####################################### # An python class to parse a comma # separates text file to calculate # the mean and standard deviation # of the inputted floating point # numbers. # Author: <YOUR NAME> # Email: <YOUR EMAIL>
38
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
To allow the script to be used as a command line tool the path to the le needs be passed into the script at runtime therefore the following changes are made to
39
#! /usr/bin/env python ####################################### # An python class to parse a comma # separates text file to calculate # the mean and standard deviation # of the inputted floating point # numbers. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### from MyMaths import MyMathsClass # To allow command line options to be # retrieved the sys python library needs # to be imported import sys class FileSummary (object): def parseCommaFile(self, file): floatingNumbers = list() for eachLine in file: substrs = eachLine.split(,,eachLine.count(,)) for strVar in substrs: floatingNumbers.append(float(strVar)) return floatingNumbers def run(self): # To retrieve the command line arguments # the sys.argv[X] is used where X refers to # the argument. The argument number starts # at 1 and is the index of a list. filename = sys.argv[1] inFile = open(filename, r) numbers = self.parseCommaFile(inFile)
40
40 41 42 43 44 45 46 47 48 49 50
To read the new script the following command needs to be run from the command prompt:
python fileSummary_commandline.py randfloats.txt
3.5
Exercise
Calculate the mean and standard deviation from only the rst column of data
Hint:
You will need to replace:
substrs = eachLine.split(,,eachLine.count(,)) for strVar in substrs: floatingNumbers.append(float(strVar))
With:
substrs = eachLine.split(,,eachLine.count(,)) # Select the column the data is stored in column1 = substrs[0] floatingNumbers.append(float(column1))
41
3.6
Further Reading
An Introduction to Python, G. van Rossum, F.L. Drake, Jr. Network Theory ISBN 0-95-416176-9 (Also available online - http://docs.python.org/3/ tutorial/) - Chapter 7. Python Documentation http://www.python.org/doc/ Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7
A common task for which python is used is to batch process a task or series of tasks. To do this the les to be processed need to be identied from within the le system. Therefore, in this tutorial you will learn to implement code to undertake this operation. To start this type out the code below into a new le (save it as IterateFiles.py).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
#! /usr/bin/env python ####################################### # A class that iterates through a directory # or directory structure and prints out theatre # identified files. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### import os.path import sys
42
43
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
def run(self): # Set the folder to search searchFolder = ./PythonCourse # Update path... self.findFiles(searchFolder)
Using the online python documentation read through the section on the le system: http://docs.python.org/library/filesys.html
44
This documentation will allow you to understand the functionality which is available for manipulating the le system.
4.2
Recursion
The next stage is to add allow the function recursively go through the directory structure. To do this add the function below to your script above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
#! /usr/bin/env python ####################################### # A class that iterates through a directory # or directory structure and prints out theatre # identified files. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### import os.path import sys class IterateFiles (object): # A function which iterates through the directory def findFilesRecurse(self, directory): # check whether the current directory exits if os.path.exists(directory): # check whether the given directory is a directory if os.path.isdir(directory): # list all the files within the directory dirFileList = os.listdir(directory) # Loop through the individual files within the directory for filename in dirFileList: # Check whether file is directory or file
45
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
if(os.path.isdir(os.path.join(directory,filename))): # If a directory is found recall this function. self.findFilesRecurse(os.path.join(directory,filename)) elif(os.path.isfile(os.path.join(directory,filename))): print(os.path.join(directory,filename)) else: print(filename + is NOT a file or directory!) else: print(directory + is not a directory!) else: print(directory + does not exist!) def run(self): # Set the folder to search searchFolder = ./PythonCourse # Update path... self.findFilesRecurse(searchFolder) if __name__ == __main__: obj = IterateFiles() obj.run()
Now call this function instead of the ndFiles. Think and observe what eect a function which calls itself will have on the order in which the le are found.
4.3
Checking le Extension
The next step is to include the function checkFileExtension to your class and create two new functions which only print out the les with the le extension of interest. This should be done for both the recursive and non-recursive functions above.
1 2 3 4 5 6
#! /usr/bin/env python ####################################### # A class that iterates through a directory # or directory structure and prints out theatre # identified files.
46
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
# A function which iterates through the directory and checks file extensions def findFilesExtRecurse(self, directory, extension): # check whether the current directory exits if os.path.exists(directory): # check whether the given directory is a directory if os.path.isdir(directory): # list all the files within the directory dirFileList = os.listdir(directory) # Loop through the individual files within the directory for filename in dirFileList: # Check whether file is directory or file if(os.path.isdir(os.path.join(directory,filename))): # If a directory is found recall this function. self.findFilesRecurse(os.path.join(directory,filename)) elif(os.path.isfile(os.path.join(directory,filename))): if(self.checkFileExtension(filename, extension)):
47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
# A function which iterates through the directory and checks file extensions def findFilesExt(self, directory, extension): # check whether the current directory exits if os.path.exists(directory): # check whether the given directory is a directory if os.path.isdir(directory): # list all the files within the directory dirFileList = os.listdir(directory) # Loop through the individual files within the directory for filename in dirFileList: # Check whether file is directory or file if(os.path.isdir(os.path.join(directory,filename))): print(os.path.join(directory,filename) + \ is a directory and therefore ignored!) elif(os.path.isfile(os.path.join(directory,filename))): if(self.checkFileExtension(filename, extension)): print(os.path.join(directory,filename)) else: print(filename + is NOT a file or directory!) else: print(directory + is not a directory!) else: print(directory + does not exist!) def run(self): # Set the folder to search searchFolder = ./PythonCourse # Update path... self.findFilesExt(searchFolder, .txt) if __name__ == __main__: obj = IterateFiles() obj.run()
48
4.4
Exercises
1. Rather than print the le paths to screen add them to a list and return them from the function. This would be useful for applications where the les to be process need to be known up front and creates a more generic piece of python which can be called from other scripts. 2. Using the return list add code to loop through the returned list and print out the le information in the following comma separated format.
[FILE NAME], [EXTENSION], [PATH], [DRIVE LETTER (On Windows)], [MODIFICATION TIME]
4.5
Further Reading
Python Documentation http://www.python.org/doc/ Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7
Many open source libraries are available from within python. These signicantly increase the available functionality, decreasing your development time. One such library is matplotlib (http://matplotlib.sourceforge.net), which provides a plotting library with a similar interface to those available within Matlab. The matplotlib website provides a detailed tutorial and documentation for all the dierent options available within the library but this worksheet provides some examples of the common plot types and a more complex example continuing on from previous examples.
5.2
Simple Script
Below is your rst script using the matplotlib library. The script demonstrates the plotting of a mathematical function, in this case a sine function. The plot function requires two lists of numbers to be provided, which provides the x and y locations of the points which go to create the displayed function. The axis can be labelled using the xlabel() and ylabel() functions while the title is set using the title() function. Finally, the show() function is used to reveal the interface
49
50
#! /usr/bin/env python ####################################### # A simple python script to display a # sine function # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### # import the matplotlib libraries from pylab import * # # t # # s Create a list with values from 0 to 3 with 0.01 intervals = arange(0.0, 3, 0.01) Calculate the sin curve for the values within t = sin(pi*t)
# Plot the values in s and t plot(t, s) xlabel(X Axis) ylabel(Y Axis) title(Simple Plot) # save plot to disk. savefig(simpleplot.pdf, dpi=200, format=PDF)
5.3
Bar Chart
The creation of a bar chart is equally simply where two lists are provided, the rst contains the locations on the X axis at which the bars start and the second the heights of the bars. The width of the bars can also be specied and their colour. More options are available in the documentation (http://matplotlib.
51
1.0
Simple Plot
0.5
Y Axis
0.0
0.5
1.0 0.0
0.5
1.0
1.5 X Axis
2.0
2.5
3.0
#! /usr/bin/env python ####################################### # A simple python script to display a # bar chart. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### from pylab import * # Values for the Y axis (i.e., height of bars) height = [5, 6, 7, 8, 12, 13, 9, 5, 7, 4, 3, 1] # Values for the x axis x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
52
18 19 20 21 22
# create plot with colour grey bar(x, height, width=1, color=gray) # save plot to disk. savefig(simplebar.pdf, dpi=200, format=PDF)
14 12 10 8 6 4 2 00 2 4 6 8 10 12 14
5.4
Pie Chart
A pie chart is similar to the previous scripts where a list of the fractions making up the pie chart is given alongside a list of labels and if required a list of fractions to explode the pie chart. Other options including colour and shadow are available and outlined in the documentation (http://matplotlib.sourceforge. net/matplotlib.pylab.html#-pie) This script also demonstrates the use of the saveg() function allowing the plot to be saved to le rather than simply displayed on screen.
53
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#! /usr/bin/env python ####################################### # A simple python script to display a # pie chart. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### from pylab import * frac = [25, 33, 17, 10, 15] labels = [25, 33, 17, 10, 15] explode = [0, 0.25, 0, 0, 0] # Create pie chart pie(frac, explode, labels, shadow=True) # Give it a title title(A Sample Pie Chart) # save the plot to a PDF file savefig(pichart.pdf, dpi=200, format=PDF)
5.5
Scatter Plot
The following script demonstrates the production of a scatter plot (http://matplotlib. sourceforge.net/matplotlib.pylab.html#-scatter) where the lists x and y provide the locations of the points in the X and Y axis and Z provides the third dimension used to colour the points.
1 2 3 4 5 6
#! /usr/bin/env python ####################################### # A simple python script to display a # scatter plot. # Author: <YOUR NAME>
54
33
15 17
10
# Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### from pylab import * # Import a random number generator from random import random x = [] y = [] z = [] # Create data values for X, Y, Z axis for i in range(5000): x.append(random() * 100) y.append(random() * 100) z.append(x[i]-y[i])
55
26 27 28 29 30 31 32 33 34 35 36 37 38 39
100 80 60 40 20 0 20 40 60 80 100
56
5.6
Line Plot
A more complicated example is now given building on the previous tutorial where the data is read in from a text le before being plotted. In this case data was downloaded from the Environment Agency and converted from columns to rows. The dataset provides the ve year average rainfall for the summer (June - August) and winter (December - February) from 1766 to 2006. Two examples of plotting this data are given where the rst plots the two datasets onto the same axis (Figure 5.5) while the second plots them onto individual axis (Figure 5.6). Information on the use of the subplot() function can be found in the matplotlib documentation (http: //matplotlib.sourceforge.net/matplotlib.pylab.html#-subplot).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
####################################### # A python script to read in a text file # of rainfall data for summer and winter # within the UK and display as a plot. # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### from pylab import * import os.path import sys class PlotRainfall (object): # Parse the input file - Three columns year, summer, winter def parseDataFile(self, dataFile, year, summer, winter): line = 0 for eachLine in dataFile: commaSplit = eachLine.split(,, eachLine.count(,)) first = True for token in commaSplit: if first: first = False else:
57
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
58
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
year = list() summer = list() winter = list() try: dataFile = open(filename, r) except IOError as e: print(\nCould not open file:\n, e) return self.parseDataFile(dataFile, year, summer, winter) dataFile.close() self.plotData(year, summer, winter, "Rainfall_SinglePlot.pdf") self.plotDataSeparate(year, summer, winter, "Rainfall_MultiplePlots.pdf") else: print(File \ + filename + \ does not exist.) if __name__ == __main__: obj = PlotRainfall() obj.run()
350 300 Rainfall (5 Year Mean) 250 200 150 100 1750
1800
1850
1900 Year
1950
2000
2050
Figure 5.5: Rainfall data for summer and winter on the same axis.
59
1800
1850
1900
1950
2000
Figure 5.6: Rainfall data for summer and winter on dierent axis.
5.7
Based on the available data is there a correlation between summer and winter rainfall? Use the lists read in of summer and winter rainfall and produce a scatterplot to answer this question.
5.8
Matplotlib http://matplotlib.sourceforge.net Python Documentation http://www.python.org/doc/ Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7
Exercise:
Further Reading
NumPy is a library for storing and manipulating multi-dimensional arrays. NumPy arrays are similar to lists, however they have a lot more functionality and allow faster operations. SciPy is a library for maths and science using NumPy arrays and includes routines for statistics, optimisation and numerical integration. A comprehensive list is available from the SciPy website (http://docs.scipy.org/doc/ scipy/reference). The combination of NumPy, SciPy and MatPlotLib provides similar functionality to that available in packages such as MatLab and Mathematica and allows for complex numerical analysis. This tutorial will introduce some of the statistical functionality of NumPy / SciPy by calculating statistics from forest inventory data, read in from a text le. Linear regression will also be used to calculate derive relationships between parameters. There are a number of ways to create NumPy arrays, one of the easiest (and the method that will be used in this tutorial) is to convert a python list to an array:
import numpy as np pythonList = [1 , 4 , 2 , 5, 3] numpyArray = np.array(pythonList)
60
61
6.2
Simple Statistics
Forest inventory data have been collected for a number of plots within Penglais woods (Aberystwyth, Wales). For each tree, the diameter, species height, crown size and position have been recorded. An example script is provided to read the diameters into a separate list for each species. The lists are then converted to NumPy arrays, from which statistics are calculated and written out to a text le.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#! /usr/bin/env python ####################################### # A script to calculate statistics from # a text file using NumPy # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### import numpy as np import scipy as sp # Import scipy stats functions we need import scipy.stats as spstats class CalculateStatistics (object): def run(self): # Set up lists to hold input diameters # A seperate list is used for each species beechDiameter = list() ashDiameter = list() birchDiameter = list() oakDiameter = list() sycamoreDiameter = list() otherDiameter = list() # Open input and output files inFileName = PenglaisWoodsData.csv
62
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
63
def createStatsLine(self, inArray): # Calculate statsistics for NumPy array and return output line. meanArray = np.mean(inArray) medianArray = np.median(inArray) stDevArray = np.std(inArray) skewArray = spstats.skew(inArray) # Create output line with stats statsLine = str(meanArray) + , + str(medianArray) + , + str(stDevArray) return statsLine if __name__ == __main__: obj = CalculateStatistics() obj.run()
Note in tutorial three, functions were written to calculate the mean and standard deviation a list, in this tutorial the same result is accomplished using the built in functionality of NumPy.
64
6.2.1
Exercises
1. Based on the example script also calculate mean, median and standard deviation for tree heights and add to the output le. 2. Look at other statistics functions available in SciPy and calculate for height and density.
6.3
Calculate Biomass
One of the features of NumPy arrays is the ability to perform mathematical operation on all elements of an array. For example, for NumPy array a:
a = np.array([1,2,3,4])
Performing
b = 2 * a
Gives
b = array([2,4,6,8])
Some special versions of functions are available to work on arrays. To calculate the natural log of a single number log may be used, to perform the natural log of an array np.log may be used (where NumPy has been imported as np). Tree volume may be calculated from height and stem diameter using: Volume = a + bD2 h0.75
(6.1)
Where D is diameter and h is height. The coecients a and b vary according to species (see Table 6.1). From volume, it is possible to calculate biomass by multiplying by the specic gravity.
65
(6.2)
The specic gravity also varies by species, values for each species are given in Table 6.1. Table 6.1: Coecients for estimating volume and the specic gravity required for estimating the biomass by species. Species a-coecient b-coecient Specic gravity Beech 0.014306 0.0000748 0.56 Ash 0.012107 0.0000777 0.54 Beech 0.009184 0.0000673 0.53 Oak 0.011724 0.0000765 0.56 Sycamore 0.012668 0.0000737 0.54 The following function takes two arrays containing height and density, and a string for species. From these biomass is calculated.
1 2 3 4 5 6 7 8 9 10 11 12
def calcBiomass(self, inDiameterArray, inHeightArray, inSpecies): if inSpecies == BEECH: a = 0.014306 b = 0.0000748 specificGravity = 0.56 # Calculate Volume volume = a + ((b*(inDiameterArray / 100)**2) * (inHeightArray**0.75)) # Calculate biomass biomass = volume * specificGravity # Return biomass return biomass
Note only the coecients for BEECH have been included therefore, if a dierent species is passed in, the program will produce an error (try to think about what the error would be). A neater way of dealing with the error would be to throw an exception if the species was not recognised. Exceptions form the basis of controlling errors in a number of programming languages (including C++ and Java) the simple concept is that as a program is running, if an error occurs an exception is thrown, at which point processing stops until the exception is caught and dealt with. If the
66
exception is never caught, then the software crashes and stops. Python provides the following syntax for exception programming,
try: < Perform operations during which an error is likely to occur > except <ExceptionName>: < If error occurs do something appropriate >
where the code you wish to run is written inside the try statement and the except statement is executed only when a named exception (within the except statement) is produced within the try block. It is good practise you use exceptions where possible as when used properly they provide more robust code which can provide more feedback to the user. The function to calculate biomass may be rewritten to throw an exception if the species is not recognised.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
def calcBiomass(self, inDiameterArray, inHeightArray, inSpecies): if inSpecies == BEECH: a = 0.014306 b = 0.0000748 specificGravity = 0.56 else: # Raise exception if species is not recognised raise Exception(Species not recognised) # Calculate Volume volume = a + ((b*(inDiameterArray / 100)**2) * (inHeightArray**0.75)) # Calculate biomass biomass = volume * specificGravity # Return biomass return biomass
The function below, calls calcBiomass to calculate biomass for an array. From this mean, median and standard deviation are calculated and an output array is returned. By calling the function from within a try and except block if the species is not recognised, it will not try to calculate stats and will return the string na (not available) for all values in the output line.
67
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
def calcBiomassStatsLine(self, inDiameterArray, inHeightArray, inSpecies): # Calculates biomass, calculates stats from biomass and returns output line biomassStatsLine = try: # Calculate biomass biomass = self.calcBiomass(inDiameterArray, inHeightArray, inSpecies) # Calculate stats from biomass meanBiomass = np.mean(biomass) medianBiomass = np.median(biomass) stDevBiomass = np.std(biomass) # Create output line biomassStatsLine = str(meanBiomass) + , + str(medianBiomass) + , + \ str(stDevBiomass) except Exception: # Catch exception and write na for all values biomassStatsLine = na,na,na return biomassStatsLine
#! /usr/bin/env python ####################################### # A script to calculate statistics from # a text file using NumPy # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 ####################################### import numpy as np import scipy as sp # Import scipy stats functions we need import scipy.stats as spstats class CalculateStatistics (object):
68
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
69
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
70
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
outLine = Ash, + self.createStatsLine(ashDiameter) + , + \ self.createStatsLine(ashHeight) + , + \ self.calcBiomassStatsLine(ashDiameter, ashHeight, ASH) + \n outFile.write(outLine) outLine = Birch, + self.createStatsLine(birchDiameter) + , + \ self.createStatsLine(birchHeight) + , + \ self.calcBiomassStatsLine(birchDiameter, birchHeight, BIRCH) + \n outFile.write(outLine) outLine = Oak, + self.createStatsLine(oakDiameter) + , + \ self.createStatsLine(oakHeight) + , + \ self.calcBiomassStatsLine(oakDiameter, oakHeight, OAK) + \n outFile.write(outLine) outLine = Sycamore, + self.createStatsLine(sycamoreDiameter) + , + \ self.createStatsLine(sycamoreHeight) + , + \ self.calcBiomassStatsLine(sycamoreDiameter, sycamoreHeight, SYC) + \n outFile.write(outLine) outLine = Other, + self.createStatsLine(otherDiameter) + , + \ self.createStatsLine(otherHeight) + , + \ self.calcBiomassStatsLine(otherDiameter, otherHeight, Other) + \n outFile.write(outLine) print(Statistics written to: + outFileName) def createStatsLine(self, inArray): # Calculate statsistics for array and return output line. meanArray = np.mean(inArray) medianArray = np.median(inArray) stDevArray = np.std(inArray) # Create output line with stats statsLine = str(meanArray) + , + str(medianArray) + , + str(stDevArray) return statsLine def calcBiomassStatsLine(self, inDiameterArray, inHeightArray, inSpecies): # Calculates biomass, calculates stats from biomass and returns output line biomassStatsLine = try: # Calculate biomass biomass = self.calcBiomass(inDiameterArray, inHeightArray, inSpecies) # Calculate stats from biomass meanBiomass = np.mean(biomass)
71
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
# Create output line biomassStatsLine = str(meanBiomass) + , + str(medianBiomass) + , + \ str(stDevBiomass) except Exception: # Catch exception and write na for all values biomassStatsLine = na,na,na return biomassStatsLine def calcBiomass(self, inDiameterArray, inHeightArray, inSpecies): if inSpecies == BEECH: a = 0.014306 b = 0.0000748 specificGravity = 0.56 else: # Raise exception is species is not recognised raise Exception(Species not recognised) # Calcualte volume volume = a + ((b*(inDiameterArray)**2) * (inHeightArray**0.75)) # Calculate biomass biomass = volume * specificGravity # Return biomass return biomass if __name__ == __main__: obj = CalculateStatistics() obj.run()
6.3.1
Exercise
1. Add in the coecients to calculate biomass for the other species 2. Write the statistics for biomass out to the text le. Remember to change the header line.
72
6.4
Linear Fitting
One of the built in feature of SciPy is the ability to perform ts. Using the linear regression function (linregress) it is possible to t equations of the form:
(6.3)
Where aCoe and bCoe are the coecients rVal is the r value (r**2 gives R2 ), pVal is the p value and stdError is the standard error. It is possible to t the following equation to the collected data expressing height as a function of diameter.
(6.4)
To t an equation of this form an array must be created containing log diameter . Linear regression may then be performed using:
linregress(np.log(inDiameterArray), inHeightArray)
To test the t it may be plotted against the original data using MatPlotLib. The following code rst performs the linear regression then creates a plot showing the t against the original data.
1 2 3 4 5 6 7 8 9 10
def plotLinearRegression(self, inDiameterArray, inHeightArray, outPlotName): # Perform fit (aCoeff,bCoeff,rVal,pVal,stdError) = linregress(np.log(inDiameterArray), \ inHeightArray) # Use fits to predict height for a range of diameters testDiameter = arange(min(inDiameterArray), max(inDiameterArray), 1) predictHeight = (aCoeff * np.log(testDiameter)) + bCoeff # Create a string, showing the form of the equation (with fitted coefficients)
73
11 12 13 14 15 16 17 18 19 20 21 22 23 24
# and r squared value # Coefficients are rounded to two decimal places. equation = str(round(aCoeff,2)) + log(D) + + str(round(bCoeff,2)) + \ (r$^2$ = + str(round(rVal**2,2)) + ) # Plot fit against origional data plot(inDiameterArray, inHeightArray,.) plot(testDiameter, predictHeight) xlabel(Diameter (cm)) ylabel(Height (m)) legend([measured data,equation]) # Save plot savefig(outPlotName, dpi=200, format=PDF)
The coecients and r2 of the t are displayed in the legend. To display the superscript 2 in the data it is possible to use LaTeX syntax. So r2 is written as: r$2$. The function may be called using:
# Set output directory for plots outDIR = ./output/directory/ self.plotLinearRegression(beechDiameter, beechHeight, outDIR + beech.pdf)
Produce a plot similar to the one shown in Figure 6.1 and save as a PDF. The nal script should result in the following:
1 2 3 4 5 6 7 8 9 10 11
#! /usr/bin/env python ####################################### # A script to calculate statistics from # a text file using NumPy # Author: <YOUR NAME> # Email: <YOUR EMAIL> # Date: DD/MM/YYYY # Version: 1.0 #######################################
74
45 40 35 30 Height (m) 25 20 15 10 5 00 20
40 60 Diameter (cm)
80
100
import numpy as np import scipy as sp # Import scipy stats functions we need import scipy.stats as spstats # Import plotting library as plt import matplotlib.pyplot as plt class CalculateStatistics (object): def run(self): # Set up lists to hold input diameters and heights # A seperate list is used for each species beechDiameter = list() beechHeight = list() ashDiameter = list() ashHeight = list() birchDiameter = list() birchHeight = list() oakDiameter = list()
75
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
76
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
77
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
# Set output directory for plots outDIR = ./ # Plot linear regression for Beech print(Generating plot:) self.plotLinearRegression(beechDiameter, beechHeight, outDIR + beech.pdf)
def plotLinearRegression(self, inDiameterArray, inHeightArray, outPlotName): # Perform fit (aCoeff,bCoeff,rVal,pVal,stdError) = spstats.linregress(np.log(inDiameterArray), inHeight # Use fits to predict height for a range of diameters testDiameter = np.arange(min(inDiameterArray), max(inDiameterArray), 1) predictHeight = (aCoeff * np.log(testDiameter)) + bCoeff # Create a string, showing the form of the equation (with fitted coefficients) # and r squared value # Coefficients are rounded to two decimal places. equation = str(round(aCoeff,2)) + log(D) + str(round(bCoeff,2)) + \ (r$^2$ = + str(round(rVal**2,2)) + ) # Plot fit against origional data plt.plot(inDiameterArray, inHeightArray,.) plt.plot(testDiameter, predictHeight) plt.xlabel(Diameter (cm)) plt.ylabel(Height (m)) plt.legend([measured data,equation]) # Save plot plt.savefig(outPlotName, dpi=200, format=PDF)
def createStatsLine(self, inArray): # Calculate statsistics for array and return output line. meanArray = np.mean(inArray) medianArray = np.median(inArray) stDevArray = np.std(inArray) # Create output line with stats statsLine = str(meanArray) + , + str(medianArray) + , + str(stDevArray)
78
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
def calcBiomassStatsLine(self, inDiameterArray, inHeightArray, inSpecies): # Calculates biomass, calculates stats from biomass and returns output line biomassStatsLine = try: # Calculate biomass biomass = self.calcBiomass(inDiameterArray, inHeightArray, inSpecies) # Calculate stats from biomass meanBiomass = np.mean(biomass) medianBiomass = np.median(biomass) stDevBiomass = np.std(biomass) # Create output line biomassStatsLine = str(meanBiomass) + , + str(medianBiomass) + ,\ + str(stDevBiomass) except Exception: # Catch exception and write na for all values biomassStatsLine = na,na,na return biomassStatsLine def calcBiomass(self, inDiameterArray, inHeightArray, inSpecies): if inSpecies == BEECH: a = 0.014306 b = 0.0000748 specificGravity = 0.56 else: # Raise exception is species is not recognised raise Exception(Species not recognised) # Calcualte volume volume = a + ((b*(inDiameterArray)**2) * (inHeightArray**0.75)) # Calculate biomass biomass = volume * specificGravity # Return biomass return biomass if __name__ == __main__: obj = CalculateStatistics() obj.run()
79
6.4.1
Exercise
Produce plots, showing linear regression ts, for the other species.
6.5
Further Reading
SciPy http://www.scipy.org/SciPy NumPy http://numpy.scipy.org An Introduction to Python, G. van Rossum, F.L. Drake, Jr. Network Theory ISBN 0-95-416176-9 (Also available online http://docs.python.org/3/ tutorial/) - Chapter 8. Python Documentation http://www.python.org/doc/ Matplotlib http://matplotlib.sourceforge.net
There are many command line tools and utilities available for all platforms (e.g., Windows, Linux, Mac OSX), these tools are extremely useful and range from simple tasks such as renaming a le to more complex tasks such as merging ESRI shapeles. One problem with these tools is that if you have a large number of les, which need to be processed in the same way, it is time consuming and error prone to manual run the command for each le. Therefore, if we can write scripts to do this work for us then processing large number of individual les becomes a much simpler and quicker task. For this worksheet you will need to have the command line tools which come with the GDAL/OGR (http://www.gdal.org) open source software library installed and available with your path. With the installation of python(x,y) the python libraries for GDAL/OGR have been installed but not the command line utilities which go along with these libraries. If you do not already have them installed therefore details on the GDAL website for your respective platform.
80
81
7.2
The rst example illustrates how the ogr2ogr command can be used to merge shapeles and a how a python script can be used to turn this command into a batch process where a whole directory of shapeles can be merged. To perform this operation two commands are required. The rst makes a copy of the rst shapele within the list of les into a new le, shown below:
> ogr2ogr <inputfile> <outputfile>
While the second command appends the contents of the inputted shapele onto the end of an existing shapele (i.e., the one just copied).
> ogr2ogr -update -append <inputfile> <outputfile> -nln <outputfilename>
For both these commands the shapeles all need to be of the same type (point, polyline or polygon) and contain the same attributes. Therefore, your rst exercise is to understand the use of the ogr2ogr command and try them from the command line with the data provided. Hint, running ogr2ogr without any options the help le will be displayed. The second stage is to develop a python script to call the appropriate commands to perform the required operation, where the following processes will be required: 1. Get the user inputs. 2. List the contents of the input directory. 3. Iterate through the directory and run the required commands. But the rst step is to create the class structure in which the code will t, this will be something similar to that shown below:
1 2 3 4 5 6
#! /usr/bin/env python ####################################### # MergeSHPfiles.py # A python script to merge shapefiles # Author: <YOUR NAME>
82
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
The script will have the input directory and output le hard coded (as shown) within the run function. Therefore, you need to edit these le paths to the location you have the les saved. Please note that under Windows you need to insert a double slash (i.e., ) within the le path as a single slash is an escape character (e.g., n for new line) within strings. The next step is to check that the input directory exists and is a directory, to do this edit your run function as below.
1 2 3 4 5 6
# A function which controls the rest of the script def run(self): # Define the input directory filePath = ./TreeCrowns/ # Define the output file newSHPfile = Merged_shapefile.shp
83
7 8 9 10 11 12 13 14 15
# Check input file path exists and is a directory if not os.path.exists(filePath): print Filepath does not exist elif not os.path.isdir(filePath): print Filepath is not a directory! else: # Merge the shapefiles within the filePath self.mergeSHPfiles(filePath, newSHPfile)
Additionally, you need to add the function mergeSHPFiles, which is where the shapeles will be merged.
# A function to control the merging of shapefiles def mergeSHPfiles(self, filePath, newSHPfile):
To merge the shapeles the rst task is to get a list of all the shapeles within a directory. To do this, use the code you developed in Tutorial 4 to list les within a directory and edit it such that the les are outputted to a list rather than printed to screen, as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# A function to test the file extension of a file def checkFileExtension(self, filename, extension): # Boolean variable to be returned by the function foundExtension = False; # Split the filename into two parts (name + ext) filenamesplit = os.path.splitext(filename) # Get the file extension into a varaiable fileExtension = filenamesplit[1].strip() # Decide whether extensions are equal if(fileExtension == extension): foundExtension = True # Return result return foundExtension
# A function which iterates through the directory and checks file extensions def findFilesExt(self, directory, extension): # Define a list to store output list of files fileList = list() # check whether the current directory exits
84
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Note, that you also need the function to check the le extension. This can then be added to the mergeSHPles function with a list to iterate through the identied les.
1 2 3 4 5 6 7 8
# A function to control the merging of shapefiles def mergeSHPfiles(self, filePath, newSHPfile): # Get the list of files within the directory # provided with the extension .shp fileList = self.findFilesExt(filePath, .shp) # Iterate through the files. for file in fileList: print file
When iterating through the les the ogr2ogr commands to be executed to merge the shapeles need to be built and executed therefore the following code needs to be added to your script.
85
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# A function to control the merging of shapefiles def mergeSHPfiles(self, filePath, newSHPfile): # Get the list of files within the directory # provided with the extension .shp fileList = self.findFilesExt(filePath, .shp) # Variable used to identify the first file first = True # A string for the command to be built command = # Iterate through the files. for file in fileList: if first: # If the first file make a copy to create the output file command = ogr2ogr + newSHPfile + + file first = False else: # Otherwise append the current shapefile to the output file command = ogr2ogr -update -append + newSHPfile + + \ file + -nln + \ self.removeSHPExtension(self.removeFilePathUNIX(newSHPfile)) # Execute the current command os.system(command)
You also require the additional functions to remove the shapele extension (.shp) and the windows le path, creating the layer name which are given below.
1 2 3 4 5 6 7 8 9 10 11 12 13
# A function to remove a .shp extension from a file name def removeSHPExtension(self, name): # The output file name outName = name # Find how many .shp strings are in the current file # name count = name.find(.shp, 0, len(name)) # If there are no instances of .shp then -1 will be returned if not count == -1: # Replace all instances of .shp with empty string. outName = name.replace(.shp, , name.count(.shp)) # Return output file name without .shp return outName
86
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
# A function to remove the file path a file # (in this case a windows file path) def removeFilePathWINS(self, name): # Remove white space (i.e., spaces, tabs) name = name.strip() # Count the number of slashs # A double slash is required because \ is a # string escape charater. count = name.count(\\) # Split string into a list where slashs occurs nameSegments = name.split(\\, count) # Return the last item in the list return nameSegments[count] # A function to remove the file path a file def removeFilePathUNIX(self, name): # Remove white space (i.e., spaces, tabs) name = name.strip() # Count the number of slashs count = name.count(/) # Split string into a list where slashs occurs nameSegments = name.split(/, count) # Return the last item in the list return nameSegments[count]
If you wanted to use this script on UNIX (i.e., Linux or Mac OS X) you would need to use the removeFilePathUNIX as shown while for windows change the code to use the removeFilePathWINS function such that the double escaped slashes are used. You script should now be complete so execute it on the data provided, within the TreeCrowns directory. Take time to understand the lines of code which have been provided and make sure your script works.
1 2 3 4 5
87
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
88
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
# A function which iterates through the directory and checks file extensions def findFilesExt(self, directory, extension): # Define a list to store output list of files fileList = list() # check whether the current directory exits if os.path.exists(directory): # check whether the given directory is a directory if os.path.isdir(directory): # list all the files within the directory dirFileList = os.listdir(directory) # Loop through the individual files within the directory for filename in dirFileList: # Check whether file is directory or file if(os.path.isdir(os.path.join(directory,filename))): print(os.path.join(directory,filename) + \ is a directory and therefore ignored!) elif(os.path.isfile(os.path.join(directory,filename))): if(self.checkFileExtension(filename, extension)):
89
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
90
129 130 131 132 133 134 135 136 137 138 139 140 141 142
7.3
The next example will require you to use the script developed above as the basis for a new script using the command below to convert a directory of images to GeoTIFF using the command given:
gdal_translate -of <OutputFormat> <InputFile> <OutputFile>
A useful step is to rst run the command from the command line manually to make sure you understand how this command is working. The two main things you need to think about are: 1. What le extension will the input les have? This should be user selectable alongside the le paths. 2. What output le name should be provided? The script should generate this. Four test images have been provided in ENVI format within the directory ENVI Images, you can use these for testing your script. If you are struggling then an example script with a solution to this task has been provided within the code directory.
91
7.3.1
It is often convenient to provide the inputs the scripts requires (e.g., input and output le locations) as arguments to the script rather than needing to the edit the script each time a dierent set of parameters are required (i.e., changing the les paths in the scripts above). This is easy within python and just requires the following changes to your run function (in this case for the merge shapeles script).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# A function which controls the rest of the script def run(self): # Get the number of arguments numArgs = len(sys.argv) # Check there are only 2 input argument (i.e., the input file # and output base). # Note that argument 0 (i.e., sys.argv[0]) is the name # of the script currently running. if numArgs == 3: # Retrieve the input directory filePath = sys.argv[1] # Retrieve the output file newSHPfile = sys.argv[2] # Check input file path exists and is a directory if not os.path.exists(filePath): print Filepath does not exist elif not os.path.isdir(filePath): print Filepath is not a directory! else: # Merge the shapefiles within the filePath self.mergeSHPfiles(filePath, newSHPfile) else: print "ERROR. Command should have the form:" print "python MergeSHPfiles_cmd.py <Input File Path> <Output File>"
In addition, to these changes you need to import the system library into your script to access these arguments.
92
Please note that the list of user provided inputs starts at index 1 and not 0. If you call sys.argv[0] then the name of the script being executed will be returned. When retrieving values from the user in this form it is highly advisable to check whether the inputs provided are valid and that all required inputs have been provided. Create a copy of the script you created earlier and edit the run function to be as shown above, making note of the lines which require editing.
7.4
Exercises
1. Using ogr2ogr develop a script that will convert the attribute table of a shapele to a CSV le which can be opened within Microsoft Excel. Note, that the outputted CSV will be put into a separate directory. 2. Create a script which calls the gdal translate command and converts all the images within a directory to a byte data type (i.e., with a range of 0 to 255).
7.5
Further Reading
GDAL - http://www.gdal.org OGR - http://www.gdal.org/ogr Python Documentation - http://www.python.org/doc Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7 Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes. html The Linux Command Line. W. E. Shotts. No Starch Press. ISBN 978-159327-389-7 (Available to download from http://linuxcommand.org/tlcl.
93
Image les used within spatial data processing (i.e., remote sensing and GIS) require the addition of a spatial header to the les which provides the origin (usually from the top left corner of the image), the pixel resolution of the image and a denition of the coordinate system and projection of the dataset. Additionally, most formats also allow a rotation to be dened. Using these elds the geographic position on the Earths surface can be dened for each pixel within the scene. Images can also contain other information in the header of the le including no data values, image statistics and band names/descriptions.
8.1.1
The GDAL software library provides a python interface to the C++ library, such that when the python functions are called is it the C++ implementation which is executed. These model has signicant advantages for operations such as reading and writing to and from image les as in pure python these operations would be slow but they as very fast within C++. Although, python is an easier language 94
95
for people to learn and use, therefore allows software to be more quickly developed so combing C++ and python in this way is a very productive way for software to be developed.
Argparser Up until this point we have read parameters from the system by just using the sys.argv list where the user is required to enter the values in a given pre-dened order. The problem with this is that it is not very helpful to the user as no help is provided or error messages given if the wrong parameters are entered. For command line tools it is generally accepted that when providing command line options they will use switches such as -i or input where the user species with a switch what the input they are providing is. Fortunately, python provides a library to simplify the implementation of this type of interface. An example of this is shown below, where rst the argparse library is imported. The parser is then created and the arguments added to the parser so the parser knows what to expect from the user. Finally, the parser is called to parse the arguments. Examples will be shown in all the following scripts.
1 2 3 4 5 6 7 8 9 10 11
# Import the python Argument parser import argparse # Create the parser parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Define the argument for specifying the output file. parser.add_argument("-o", "--output", type=str, help="Specify the output text file.") # Call the parser to parse the arguments. args = parser.parse_args()
96
8.1.2
The follow example demonstrates how to import the GDAL library into python and to read the image header information and print it to the console - similar to the functionality within the gdalinfo command. Read the comments within the code and ensure you understand the steps involved.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
#!/usr/bin/env python # Import the GDAL python library import osgeo.gdal as gdal # Import the python Argument parser import argparse # Import the System library import sys # Define a function to read and print the images # header information. def printImageHeader(inputFile): # Open the dataset in Read Only mode dataset = gdal.Open(inputFile, gdal.GA_ReadOnly) # Check that the dataset has correctly opened if not dataset is None: # Print out the image file path. print(inputFile) # Print out the number of image bands. print("The image has ", dataset.RasterCount, " bands.") # Loop through all the image bands and print out the band name for n in range(dataset.RasterCount): print("\t", n+1, ":\t", dataset.GetRasterBand(n+1).GetDescription(), "\t") # Print out the image size in pixels print("Image Size [", dataset.RasterXSize, ",", dataset.RasterYSize, "]") # # # # # # Get the geographic header geotransform[0] = TL X Coordinate geotransform[1] = X Pixel Resolution geotransform[2] = X Rotation geotransform[3] = TL Y Coordinate geotransform[4] = Y Rotation
97
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
# This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Otherwise, run the function to print out the image header information. printImageHeader(args.input)
CHAPTER 8. IMAGE PROCESSING USING GDAL AND RIOS Running the script
98
Run the script as you have done others within these worksheets and as shown below, you need to provide the full path to the image le or copy the image le into the same directory as your script. This should result in an output like the one shown below:
> python ReadImageHeader.py -i LSTOA_Tanz_2000Wet.img LSTOA_Tanz_2000Wet.img The image has 6 bands. 1 : Band 1 2 : Band 2 3 : Band 3 4 : Band 4 5 : Band 5 6 : Band 6 Image Size [ 1776 , 1871 ] Origin = ( 35.2128071515 , -3.05897460167 ) Pixel Size = ( 0.000271352299023 , -0.000271352299023 ) Rotation = ( 0.0 , 0.0 )
8.1.3
No Data Values
GDAL also allows us to edit the image header values, therefore the following example provides an example of how to edit the no data value for image band. Note that when opening the image le the gdal.GA Update option is used rather than gdal.GA ReadOnly. A no data value is useful for dening regions of the image which are not valid (i.e., outside of the image boundaries) and can be ignored during processing.
Running the script For the le provided (LSTOA Tanz 2000Wet.img) the no data value for all the bands should be 0. Therefore, run the following command:
99
> python setnodata.py Setting No data (0.0) Setting No data (0.0) Setting No data (0.0) Setting No data (0.0) Setting No data (0.0) Setting No data (0.0)
-i LSTOA_Tanz_2000Wet.img -n 0.0 for band 1 for band 2 for band 3 for band 4 for band 5 for band 6
To check that command successfully edited the input le use the gdalinfo command, as shown below:
gdalinfo -norat LSTOA_Tanz_2000Wet.img
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#!/usr/bin/env python # Import the GDAL python library import osgeo.gdal as gdal # Import the python Argument parser import argparse # Import the System library import sys # A function to set the no data value # for each image band. def setNoData(inputFile, noDataVal): # Open the image file, in update mode # so that the image can be edited. dataset = gdal.Open(inputFile, gdal.GA_Update) # Check that the image has been opened. if not dataset is None: # Iterate throught he image bands # Note. i starts at 0 while the # band count in GDAL starts at 1. for i in range(dataset.RasterCount): # Print information to the user on what is # being set. print("Setting No data (" + str(noDataVal) + ") for band " + str(i+1)) # Get the image band # the i+1 is because GDAL bands
100
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
8.1.4
Band Name
Band names are useful for a user to understand a data set more easily. Therefore, naming the image bands, such as Blue, Green, Red, NIR and SWIR, is very useful. The following example illustrates how to edit the band name description
101
#!/usr/bin/env python # Import the GDAL python library import osgeo.gdal as gdal # Import the python Argument parser import argparse # Import the System library import sys # A function to set the no data value # for each image band. def setBandName(inputFile, band, name): # Open the image file, in update mode # so that the image can be edited. dataset = gdal.Open(inputFile, gdal.GA_Update) # Check that the image has been opened. if not dataset is None: # Get the image band imgBand = dataset.GetRasterBand(band) # Check the image band was available. if not imgBand is None: # Set the image band name. imgBand.SetDescription(name) else: # Print out an error message. print("Could not open the image band: ", band) else: # Print an error message if the file # could not be opened. print("Could not open the input image file: ", inputFile) # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str,
102
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
Running the script The le provided (LSTOA Tanz 2000Wet.img) just has some default band names dened (i.e., Band 1) but use you script to change them to something more useful. Therefore, run the following commands:
103
Use you script for reading the image header values and printing them to the screen to nd out whether it worked.
8.1.5
GDAL Meta-Data
GDAL supports the concept of meta-data on both the image bands and the whole image. The meta-data allows any other data to be stored within the image le as a string. The following example shows how to read the meta-data values and to list all the meta-data variables available on both the image bands and the image.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#!/usr/bin/env python # Import the GDAL python library import osgeo.gdal as gdal # Import the python Argument parser import argparse # Import the System library import sys # A function to read a meta-data item # from a image band def readBandMetaData(inputFile, band, name): # Open the dataset in Read Only mode dataset = gdal.Open(inputFile, gdal.GA_ReadOnly) # Check that the image has been opened. if not dataset is None: # Get the image band imgBand = dataset.GetRasterBand(band) # Check the image band was available.
104
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
105
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
106
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
# This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Define the argument for specifying image band. parser.add_argument("-b", "--band", type=int, default=0, help="Specify image band.") # Define the argument for specifying meta-data name. parser.add_argument("-n", "--name", type=str, help="Specify the meta-data name.") # Define the argument for specifying whether the # meta-data field should be just listed. parser.add_argument("-l", "--list", action="store_true", default=False, help="Specify that meta data items should be listed.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit()
107
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
# Check that the name parameter has been specified. # If it has been specified then run functions to # read band meta-data. if not args.name == None: # Check whether the image band has been specified. # the default was set at 0 (to indicate that it) # hasnt been specified as GDAL band count starts # at 1. This also means the user cannot type in # a value of 0 and get an error. if args.band == 0: # Run the function to print out the image meta-data value. readImageMetaData(args.input, args.name) else: # Otherwise, run the function to print out the band meta-data value. readBandMetaData(args.input, args.band, args.name) elif args.list: if args.band == 0: # Run the function to list image meta-data. listImageMetaData(args.input) else: # Otherwise, run the function to list band meta-data. listBandMetaData(args.input, args.band) else: # Print an error message if not and exit. print("Error: the meta-data name or list option" + \ " need to be specified was not specified.") sys.exit()
Running the script This script has a number of options. Have a play with these options on the image provided, an example shown below.
python python python python python ReadGDALMetaData.py ReadGDALMetaData.py ReadGDALMetaData.py ReadGDALMetaData.py ReadGDALMetaData.py -h -i -i -i -i
-l -b 1 -l -b 1 -n LAYER_TYPE -b 3 -n STATISTICS_MEAN
108
8.2
The raster input and output (I/O) simplication (RIOS) library is a set of python modules which makes it easier to write raster processing code in Python. Built on top of GDAL, it handles the details of opening and closing les, checking alignment of projections and raster grid, stepping through the raster in small blocks, etc., allowing the programmer to concentrate on implementing the solution to the problem rather than on how to access the raster data and detail with the spatial header. Also, GDAL provides access to the image data through python RIOS makes it much more user friendly and easier to use. RIOS is available for as a free download from https://bitbucket.org/chchrsc/rios/overview
8.2.1
Python provides a very useful help system through the command line. To get access to the help run python from the terminal
> python
CHAPTER 8. IMAGE PROCESSING USING GDAL AND RIOS To exit the help system just press the q key on the keyboard.
109
8.2.2
Band Maths
Being able to apply equations to combine image bands, images or scale single bands is a key tool for remote sensing, for example to calibrate Landsat to radiance. The following examples demonstrate how to do this within the RIOS framework.
8.2.3
Multiply by a constant
The rst example just multiples all the image bands by a constant (provided by the user). The rst part of the code reads the users parameters (input le, output le and scale factor). To use the applier interface within RIOS you need to rst setup the input and output le associations and then any other options required, in this case the constant for multiplication. Also, the controls object should be dened to set any other parameters All processing within RIOS is undertaken on blocks, by default 200 200 pixels in size. To process the block a applier function needs to be dened (e.g., mutliplyByValue) where the inputs and outputs are passed to the function (these are the pixel values) and the other arguments object previously dened. The pixel values are represented as a numpy array, the dimensions are (n, y, x) where n is the number of image bands, y is the number of rows and x the number of columns in the block. Because numpy will iterate through the array for us to multiply the whole array by a constant (e.g., 2) then we can just need the syntax shown below, which makes it very simple.
1 2 3 4 5 6
#!/usr/bin/env python import sys # Import the python Argument parser import argparse # Import the RIOS applier interface
110
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
111
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
Run the Script Run the script using the following command, the input image is a Landsat scene and all the pixel values will be multiplied by 2.
python MultiplyRIOSExample.py -i LSTOA_Tanz_2000Wet.img \ -o LSTOA_Tanz_2000Wet_Multiby2.img -m 2
8.2.4
Calculate NDVI
To use the image bands independently to calculate a new value, usually indices such as the NDVI NIR RED NIR + RED
NDVI =
(8.1)
112
requires that the bands are referenced independently within the input data. Using numpy to calculate the index, as shown below, results in a single output block with the dimensions of the block but does not have the third dimension (i.e., the band) which is required for RIOS to identify how to create the output image. Therefore, as you will see in the example below an extra dimension needs to be added before outputting the data to the le. Within the example given the input pixel values are converted to oating point values (rather than whatever they were inputted as from the input) because the output will be a oating point number (i.e., an NDVI have a range of 1 to 1).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
#!/usr/bin/env python # Import the python Argument parser import argparse # Import the RIOS applier interface from rios import applier from rios import cuiprogress # import numpy # Define the applier function def mutliplyByValue(info, inputs, outputs, otherargs): # Convert the input data to Float32 # This is because the output is a float due to the # divide within the NDVI calculation. inputs.image1 = inputs.image1.astype (numpy.float32) # Calculate the NDVI for the block. # Note. Numpy will deal with the image iterating # to all the individual pixels values. # within python this is very important # as python loops are slow. out = ((inputs.image1[otherargs.nirband]inputs.image1[otherargs.redband]) / (inputs.image1[otherargs.nirband]+ inputs.image1[otherargs.redband])) # Add an extra dimension to the output array. # The output array needs to have 3 dimensions # (No Bands, Y Pixels(Rows), X Pixels(Cols)
113
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
114
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
Run the Script Run the script using the following command, the input image is a Landsat scene so the red band is therefore band 3 and then NIR band is band 4.
python RIOSExampleNDVI.py -i LSTOA_Tanz_2000Wet.img \ -o LSTOA_Tanz_2000Wet_NDVI.img -r 3 -n 4
8.2.5
Where multiple input les are required, in this case the NIR and Red bands are represented by dierent image les, the input les need to be specied in the input les association as image1, image2 etc. and the pixel values within the applier
115
function are therefore referenced in the same way. Because, in this example the images only have a single image band the input images has the same dimensions as the output so no extra dimensions need to be added.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#!/usr/bin/env python # Import the system library import sys # Import the python Argument parser import argparse # Import the RIOS applier interface from rios import applier # Import the RIOS progress feedback from rios import cuiprogress # Import the numpy library import numpy # Define the applier function def mutliplyByValue(info, inputs, outputs): # Convert the input data to Float32 # This is because the output is a float due to the # divide within the NDVI calculation. inputs.image1 = inputs.image1.astype (numpy.float32) inputs.image2 = inputs.image2.astype (numpy.float32) # Calculate the NDVI for the block. # Note. Numpy will deal with the image iterating # to all the individual pixels values. # within python this is very important # as python loops are slow. outputs.outimage = ((inputs.image2-inputs.image1) / (inputs.image2+inputs.image1)) # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the output file.
116
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
# Check that the red input parameter has been specified. if args.red == None: # Print an error message if not and exit. print("Error: No red input image file provided.") sys.exit() # Check that the NIR input parameter has been specified. if args.red == None: # Print an error message if not and exit. print("Error: No NIR input image file provided.") sys.exit() # Check that the output parameter has been specified. if args.output == None: # Print an error message if not and exit. print("Error: No output image file provided.") sys.exit() # Create input files file names associations infiles = applier.FilenameAssociations() # Set images to the input image specified infiles.image1 = args.red infiles.image2 = args.nir # Create output files file names associations outfiles = applier.FilenameAssociations() # Set outImage to the output image specified outfiles.outimage = args.output # Create a controls objects aControls = applier.ApplierControls() # Set the progress object.
117
78 79 80 81 82 83 84 85
Run the Script Run the script using the following command, the input image is a Landsat scene so the red band is therefore band 3 and then NIR band is band 4.
python RIOSExampleMultiFileNDVI.py -o LSTOA_Tanz_2000Wet_MultiIn_NDVI.img \ -r LSTOA_Tanz_2000Wet_Red.img -n LSTOA_Tanz_2000Wet_NIR.img
8.3
Filtering Images
To ltering an image is done through a windowing operation where the windows of pixels, such as a 3 3 or 5 5 (it needs to be an odd number), are selected and a new value for the centre pixel is calculated using all the pixel values within the window. In this example a median lter will be used so the middle pixel value will be replaced with the median value of the window. Scipy (http://www.scipy.org) is another library of python functions, which is paired with numpy, and provides many useful functions we can use when processing the images or other datasets within python. The ndimage module (http://docs. scipy.org/doc/scipy/reference/tutorial/ndimage.html) provides many useful functions, which can be applied to images in the same way as the median lter has been used in the example below I strongly recommend you look through the documentation of scipy to get an idea of the types of functions which are available.
118
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
119
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
else: # If the writer is created write the # output block to the file. writer.write(out) # Close the writer and calculate # the image statistics. writer.close(calcStats=True) # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Define the argument for specifying the output file. parser.add_argument("-o", "--output", type=str, help="Specify the output image file.") # Define the argument for the size of the image filter. parser.add_argument("-s", "--size", default=3, type=int, help="Filter size.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Check that the output parameter has been specified. if args.output == None:
120
83 84 85 86 87 88 89 90
After you have run this command open the images in the image viewer and ick between them to observe the change in the image, what do you notice?
8.4
Another option we have is to use the where function within numpy to select pixel corresponding to certain criteria (i.e., pixels with an NDVI < 0.2 is not vegetation) and classify them accordingly where a pixel values are used to indicate the corresponding class (e.g., 1 = Forest, 2 = Water, 3 = Grass, etc). These images where pixel values are not continuous but categories are referred to as thematic images and there is a header value that can be set to indicate this type of image. Therefore, in the script below there is a function for setting the image band metadata eld LAYER TYPE to be thematic. Setting an image as thematic means that the nearest neighbour algorithm will be used when calculating pyramids and histograms needs to be binned with single whole values. It also means that a colour table (See Chapter 9) can also be added. To build the rule base the output pixel values need to be created, here using the numpy function zeros (http://docs.scipy.org/doc/numpy/reference/generated/ numpy.zeros.html). The function zeros creates a numpy array of the requested
121
shape (in this case the shape is taken from the inputted image) where all the pixels have a value of zero. Using the where function (http://docs.scipy.org/doc/numpy/reference/generated/ numpy.where.html) a logic statement can be applied to an array or set of arrays (which must be of the same size) to select the pixels for which the statement is true. The where function returns an array of indexes which can be used to address another array (i.e., the output array) and set a suitable output value (i.e., the classication code).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
#!/usr/bin/env python # Import the system library import sys # Import the python Argument parser import argparse # Import the RIOS applier interface from rios import applier # Import the RIOS progress feedback from rios import cuiprogress # Import the numpy library import numpy # Import the GDAL library from osgeo import gdal # Define the applier function def rulebaseClassifier(info, inputs, outputs): # Create an output array with the same dims # as a single band of the input file. out = numpy.zeros(inputs.image1[0].shape) # Use where statements to select the # pixels to be classified. Give them a # integer value (i.e., 1, 2, 3, 4) to # specify the class. out[numpy.where((inputs.image1[0] > 0.4 )&(inputs.image1[0] < 0.7))] = 1 out[numpy.where(inputs.image1[0] < 0.1 )] = 2 out[numpy.where((inputs.image1[0] > 0.1 )&(inputs.image1[0] < 0.4))] = 3 out[numpy.where(inputs.image1[0] > 0.7 )] = 4 # Expand the output array to include a single # image band and set as the output dataset.
122
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
123
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
Run the Script Run the script with one of the NDVI layers you previously calculated. To see the result then it is recommended that a colour table is added (see next worksheet), the easiest way to do that is to use the gdalcalcstats command, as shown below.
python RuleBaseClassification.py -i LSTOA_Tanz_2000Wet_NDVI.img \ -o LSTOA_Tanz_2000Wet_classification.img # Run gdalcalcstats to add a random colour table gdalcalcstats LSTOA_Tanz_2000Wet_classification.img
124
8.5
Exercises
1. Create rule based classication using multiple image bands. 2. Create a rule based classication using image bands from dierent input images. 3. Using the previous work sheet as a basis create a script which calls the gdalwarp command to resample an input image to the same pixel resolution as another image, where the header is read as shown in this work sheet.
8.6
Further Reading
GDAL - http://www.gdal.org Python Documentation - http://www.python.org/doc Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7 Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes. html SciPy http://www.scipy.org/SciPy NumPy http://numpy.scipy.org RIOS https://bitbucket.org/chchrsc/rios/wiki/Home
9.1
Reading Columns
To access the RAT using RIOS, you need to import the rat module. The RAT module provides a simple interface for reading and writing columns. When a column is read it is returned as a numpy array where the size is n 1 (i.e., the number of rows in the attribute table). As shown in the example below, a reading a column is just a single function call specifying the input image le and the column name.
125
126
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
127
42 43 44 45 46
Run the Script Run the script as follow, the example below prints the Histogram but use the viewer to see what other columns are within the attribute table.
python ReadRATColumn.py -i WV2_525N040W_2m_segments.kea -n Histogram
9.2
Writing Columns
Writing a column is also quite straight forward just requiring a n 1 numpy array with the the data to be written to the output le, the image le path and the name of the column to be written to.
9.2.1
The rst example reads a column from the input image and just multiples it by 2 and writes it to the image le as a new column.
1 2 3 4 5 6 7 8 9 10
#!/usr/bin/env python # Import the system library import sys # Import the RIOS rat library. from rios import rat # Import the python Argument parser import argparse # The applier function to multiply the input
128
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
129
52 53 54 55
Run the Script Run the script as follows, in this simple case the histogram will be multiplied by 2 and saved as a new column.
python MultiplyColumn.py -i WV2_525N040W_2m_segments.kea -c Histogram -o HistoMulti2
9.2.2
A useful column to have within the attribute table, where a classication has been undertaken, is class names. This allows a user to click on the image and rather than having to remember which codes correspond to which class they will be shown a class name. To add class names to the attribute table a new column needs to be created, where the data type is set to be ASCII (string). To do this a copy of the histogram column is made where the new numpy array is empty, of type string and the same length at the histogram. The following line using the ... syntax within the array index to specify all elements of the array, such that they are all set to a value of NA. Once the new column has been created then the class names can be simply dened through referencing the appropriate array index.
1 2 3 4 5 6
#!/usr/bin/env python # Import the system library import sys # Import the RIOS rat library. from rios import rat
130
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
131
Run the script as follows, use the classication you did at the end of worksheet 8.
python AddClassNames.py -i LSTOA_Tanz_2000Wet_classification.img
9.3
Another useful tool is being able to add a colour table to an image, such that classes are displayed in colours appropriate to make interpretation easier. To colour up the per pixel classication undertake at the end of the previous exercise and given class names using the previous scripts the following script is used to add a colour table. The colour table is represented as an n 5 dimensional array, where n is the number of colours which are to be present within the colour table. The 5 values associated with each colour are 1. Image Pixel Value 2. Red (0 255) 3. Green (0 255) 4. Blue (0 255) 5. Opacity (0 255) Where an opacity of 0 means completely transparent and 255 means solid with no transparency (opacity is something also referred to as alpha or alpha channel).
1 2 3 4
132
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
133
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
rat.setColorTable(imageFile, ct)
# This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Run the add colour table function addColourTable(args.input)
Run the Script Run the script as follows, use the classication you did at the end of worksheet 8.
134
To nd the Red, Green and Blue (RGB) values to use with the colour table there are many websites available only that provide lists of these colours (e.g., http: //cloford.com/resources/colours/500col.htm).
9.4
To use a RAT to undertake a rule based object oriented classication the rst step is to create a set of image clumps (e.g., through segmentation see appendix A section A.3), then the rows of the attribute table need populating with information (e.g., see appendix A section A.4). Once these steps have been completed then a rule base using the numpy where statements can be created and executed, resulting in a similar process as the eCognition software.
9.4.1
This is a similar process to developing a rule based classication within eCognition, where the clumps are the segments/objects and the columns are the features, such as mean, standard deviation etc. Using where statements, similar to those used within the rule based classication of the image pixels, the clumps can be classied. The example, shown below, illustrates the classication of the Landcover classication system (LCCS) levels 1 to 3. Where the classes are represented by string names within the class names column within the attribute table. The classication is undertaken using three dates of WorldView2 imagery captured over Cors Fochno, in Wales UK. A segmentation has been provided and the segments have been populated with mean reectance values from the three WorldView2 images, the DTM minimum and maximum and the CHM height.
1 2
#!/usr/bin/env python
135
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
# A function for classifying the first part of level 1 def classifyLevel1FromImg(urbanMask, wbiPeak, fdiPeak, wbiPost, fdiPost, wbiPre, fdiPre, psriPre, repPeak): # Create Output Array l1P1 = numpy.empty_like(urbanMask, dtype=numpy.dtype(a255)) l1P1[...] = "NA" # Urban l1P1 = numpy.where(numpy.logical_and(l1P1 == "NA", urbanMask > URBAN_MASK_THRES), "Urban", l1P1) # Water l1P1 = numpy.where(numpy.logical_and(l1P1 == "NA", numpy.logical_or(wbiPre >= WBI_PRE_THRES, wbiPeak >= WBI_PEAK_THRES)), "Water", l1P1)
136
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
# A function for classifying the second part of level 1 def classifyLevel1Assign(classLevel1Img): # Create Output Array level1 = numpy.empty_like(classLevel1Img, dtype=numpy.dtype(a255)) level1[...] = "NA" # Non Vegetated level1 = numpy.where(numpy.logical_or(classLevel1Img == "NA", numpy.logical_or(classLevel1Img == "Water", classLevel1Img == "Urban")), "Non Vegetated", level1) # Vegetated level1 = numpy.where(numpy.logical_or( classLevel1Img == "Photosynthetic Vegetated", classLevel1Img == "Non Photosynthetic Vegetated", classLevel1Img == "Non Submerged Aquatic Vegetated"), "Vegetated", level1) return level1 # A function for classifying level 2 def classifyLevel2(wbiPre, wbiPeak, wbiPost, classLevel1Img): # Create Output Array level2 = numpy.empty_like(classLevel1Img, dtype=numpy.dtype(a255)) level2[...] = "NA" # Terrestrial Non Vegetated
137
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
level2 = numpy.where(numpy.logical_or(classLevel1Img == "NA", classLevel1Img == "Urban"), "Terrestrial Non Vegetated", level2) # Aquatic Non Vegetated level2 = numpy.where(numpy.logical_and( numpy.logical_not(classLevel1Img == "Urban"), numpy.logical_or(wbiPre > 1, wbiPeak > 1)), "Aquatic Non Vegetated", level2) # Terrestrial Vegetated level2 = numpy.where(numpy.logical_or(classLevel1Img == "Photosynthetic Vegetated", classLevel1Img == "Non Photosynthetic Vegetated"), "Terrestrial Vegetated", level2) # Aquatic Vegetated level2 = numpy.where(classLevel1Img == "Non Submerged Aquatic Vegetated", "Aquatic Vegetated", level2) return level2 # A function for classifying level 3 def classifyLevel3(classLevel2, cult, urban): # Create Output Array level3 = numpy.empty_like(classLevel2, dtype=numpy.dtype(a255)) level3[...] = "NA" # Cultivated Terrestrial Vegetated level3 = numpy.where(numpy.logical_and( classLevel2 == "Terrestrial Vegetated", cult > CULT_MASK_THRES), "Cultivated Terrestrial Vegetated", level3) # Natural Terrestrial Vegetated level3 = numpy.where(numpy.logical_and(numpy.logical_not (level3 == "Cultivated Terrestrial Vegetated"), classLevel2 == "Terrestrial Vegetated"), "Natural Terrestrial Vegetated", level3) # Cultivated Aquatic Vegetated level3 = numpy.where(numpy.logical_and(classLevel2 == "Aquatic Vegetated", cult > CULT_MASK_THRES), "Cultivated Aquatic Vegetated", level3) # Natural Aquatic Vegetated level3 = numpy.where(numpy.logical_and(numpy.logical_not (level3 == "Cultivated Aquatic Vegetated"), classLevel2 == "Aquatic Vegetated"), "Natural Aquatic Vegetated", level3) # Artificial Surface level3 = numpy.where(numpy.logical_and(classLevel2 == "Terrestrial Non Vegetated",
138
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
urban > URBAN_MASK_THRES), "Artificial Surface", level3) # Natural Surface level3 = numpy.where(numpy.logical_and(numpy.logical_not (level3 == "Artificial Surface"), classLevel2 == "Terrestrial Non Vegetated"), "Natural Surface", level3) # Natural Water level3 = numpy.where(classLevel2 == "Aquatic Non Vegetated", "Natural Water", level3) return level3 def runClassification(fname): # Open the GDAL Dataset so it is just opened once # and reused rather than each rios call reopening # the image file which will large attribute tables # can be slow. ratDataset = gdal.Open( fname, gdal.GA_Update ) # Check the image file was openned correctly. if not ratDataset == None: # Provide feedback to the user. print("Import Columns.") urban = rat.readColumn(ratDataset, "PropUrban") cult = rat.readColumn(ratDataset, "PropCult") # Read in the RAT columns for the Pre-Flush image PreCoastal = rat.readColumn(ratDataset, "MarB1") PreBlue = rat.readColumn(ratDataset, "MarB2") PreRed = rat.readColumn(ratDataset, "MarB5") PreRedEdge = rat.readColumn(ratDataset, "MarB6") PreNIR1 = rat.readColumn(ratDataset, "MarB7") # Read in the RAT columns for the Peak-flush image. PeakCoastal = rat.readColumn(ratDataset, "JulyB1") PeakBlue = rat.readColumn(ratDataset, "JulyB2") PeakRed = rat.readColumn(ratDataset, "JulyB5") PeakRedEdge = rat.readColumn(ratDataset, "JulyB6") PeakNIR1 = rat.readColumn(ratDataset, "JulyB7") PeakNIR2 = rat.readColumn(ratDataset, "JulyB8") # Read in the RAT columns for the Post-flush image. PostCoastal = rat.readColumn(ratDataset, "NovB1")
139
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207
140
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241
# Call the function which classifies the level 2 of the classification. print("Classifying Level 2") classLevel2 = classifyLevel2(wbiPre, wbiPeak, wbiPost, classLevel1Img) # Write the level 2 classification to the image. rat.writeColumn(ratDataset, "ClassLevel2", classLevel2) # Call the function which classifies level 3 of the classification print("Classifying Level 3") classLevel3 = classifyLevel3(classLevel2, cult, urban) # Write the level 3 classification to the image. rat.writeColumn(ratDataset, "ClassLevel3", classLevel3) else: print("Image could not be openned") # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Run the classification runClassification(args.input)
Run the Classication To run the classication use the following command:
141
Colour the classication Following the classication, the clusters need to be coloured and the script for this shown below. The previous example of adding a colour table is not suited to this case as colours are being applied to the individual segments based on their class allocation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
#!/usr/bin/env python # Import the system library import sys # import the rat module from rios from rios import rat # import numpy import numpy # import gdal import osgeo.gdal as gdal # Import the python Argument parser import argparse # A function for def colourLevel3(classLevel3): # Create the empty output arrays and set them # so they all have a value of 0 other than # opacity which is 255 to create solid colours level3red = numpy.empty_like(classLevel3, dtype=numpy.int) level3red[...] = 0 level3green = numpy.empty_like(classLevel3, dtype=numpy.int) level3green[...] = 0 level3blue = numpy.empty_like(classLevel3, dtype=numpy.int) level3blue[...] = 0 level3alpha = numpy.empty_like(classLevel3, dtype=numpy.int) level3alpha[...] = 255 # For segmentation of class NA set them to be black level3red = numpy.where(classLevel3 == "NA", 0, level3red)
142
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
# Colour Cultivated Terrestrial Vegetated level3red = numpy.where(classLevel3 == "Cultivated Terrestrial Vegetated", 192, level3red) level3green = numpy.where(classLevel3 == "Cultivated Terrestrial Vegetated", 255, level3green) level3blue = numpy.where(classLevel3 == "Cultivated Terrestrial Vegetated", 0, level3blue) level3alpha = numpy.where(classLevel3 == "Cultivated Terrestrial Vegetated", 255, level3alpha) # Colour Natural Terrestrial Vegetated level3red = numpy.where(classLevel3 == "Natural Terrestrial Vegetated", 0, level3red) level3green = numpy.where(classLevel3 == "Natural Terrestrial Vegetated", 128, level3green) level3blue = numpy.where(classLevel3 == "Natural Terrestrial Vegetated", 0, level3blue) level3alpha = numpy.where(classLevel3 == "Natural Terrestrial Vegetated", 255, level3alpha) # Colour Cultivated Aquatic Vegetated level3red = numpy.where(classLevel3 == "Cultivated Aquatic Vegetated", 0, level3red) level3green = numpy.where(classLevel3 == "Cultivated Aquatic Vegetated", 255, level3green) level3blue = numpy.where(classLevel3 == "Cultivated Aquatic Vegetated", 255, level3blue) level3alpha = numpy.where(classLevel3 == "Cultivated Aquatic Vegetated", 255, level3alpha) # Colour Natural Aquatic Vegetated level3red = numpy.where(classLevel3 == "Natural Aquatic Vegetated", 0, level3red) level3green = numpy.where(classLevel3 == "Natural Aquatic Vegetated", 192, level3green) level3blue = numpy.where(classLevel3 == "Natural Aquatic Vegetated", 122, level3blue)
143
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
144
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152
# Print some user feedback print("Classifying Level 3") # Call function to assign colours to arrays level3red, level3green, level3blue, level3alpha = colourLevel3(level3) # Write the values to the Output Columns rat.writeColumn(ratDataset, "Red", level3red) rat.writeColumn(ratDataset, "Green", level3green) rat.writeColumn(ratDataset, "Blue", level3blue) rat.writeColumn(ratDataset, "Alpha", level3alpha) else:
145
9.5 9.6
GDAL - http://www.gdal.org Python Documentation - http://www.python.org/doc Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7 Learn UNIX in 10 minutes - http://freeengineer.org/learnUNIXin10minutes. html SciPy http://www.scipy.org/SciPy NumPy http://numpy.scipy.org RIOS https://bitbucket.org/chchrsc/rios/wiki/Home
The aim of this work sheet is to develop a populate model for a bird, called the Golden Plover.
10.2
Model Output
The model is required to output the total population of the birds for each year and the number of bird, eggs, edgling and the number of edglings which are a year old. Providing an option to export the results as a plot should also be provided.
10.3
Reading Parameters
To allow a user to parameterise the model a parameter card, such as the one shown below, needs to be provided.
146
147
numOfYears=20 initalAdultPairPop=15 winterSurvivalRate=0.66 averageEggsPerPair=3.64 averageFledgelingsPerPair=3.2 predatorControl=False numOfFledgelings=14 numOfFledgelingsYearOld=8 fledgelingsSurvivePredatorsCtrl=0.75 fledgelingsSurvivePredatorsNoCtrl=0.18 #!/usr/bin/env python # Import the system library import sys # Import the python Argument parser import argparse # Import the maths library import math as math # A class for the golden plover population model class GoldenPloverPopModel (object): # A function to parse the input parameters file. def parseParameterFile(self, inputFile): # A string to store the input parameters to # be outputted in the output file. paramsStr = "## Input Parameters to the model.\n" # Open the input parameter file. parameterFile = open(inputFile, r) # Create a dictionary object to store the # input parameters. params = dict() # Loop through each line of the input # text file. for line in parameterFile: # Strip any white space either # side of the text. line = line.strip() # Add the line to the output # parameters file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
148
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
149
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
# This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() obj = GoldenPloverPopModel() obj.run(args.input)
10.4
1 2 3 4 5 6 7
The Model
#!/usr/bin/env python # Import the system library import sys # Import the python Argument parser import argparse # Import the maths library
150
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
151
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89
152
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
# Get the number of adults and fledgelings following winter. # Based on their winter survival rate. numOfAdultsPairs=int(numOfAdultsPairs*params[winterSurvivalRate]) numOfFledgelingsYearOld=int(numOfFledgelingsYearOld*params[winterSurvivalRate]) # Get the numbers of eggs to hatch numOfEggs = int(numOfAdultsPairs * params[averageEggsPerPair]) # Append to output list. numOfEggsOut.append(numOfEggs) # Get the number of new fledgelings. numOfFledgelings = int(numOfAdultsPairs * params[averageFledgelingsPerPair]) # Append to output. numOfFledgelingsB4PredOut.append(numOfFledgelings)
# Apply fledgeling survival rate with an option to # apply predator control (or not) if params[predatorControl]: # With predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsCtrl]) else: # Without predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsNoCtrl] # Once the model has completed return the output variables for analysis. return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut, # The run function controlling the overall order # of when things run. def run(self, inputFile): # Provide user feedback to the user. print("Parse Input File.")
153
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
# Call the function to parse the input file. params, paramsStr = self.parseParameterFile(inputFile) # Print he parameters. print(params) # Provide some progress feedback to the user print("Run the model") # Run the model and get the output parameters. numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut, numOfFl # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Create instance of the model class obj = GoldenPloverPopModel() # Call the run function to execute the model. obj.run(args.input)
10.5
1 2 3 4 5
Exporting Data
#!/usr/bin/env python # Import the system library import sys # Import the python Argument parser
154
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
155
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
156
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
# Get the number of adults and fledgelings following winter. # Based on their winter survival rate. numOfAdultsPairs=int(numOfAdultsPairs*params[winterSurvivalRate]) numOfFledgelingsYearOld=int(numOfFledgelingsYearOld*params[winterSurvivalRate]) # Get the numbers of eggs to hatch numOfEggs = int(numOfAdultsPairs * params[averageEggsPerPair]) # Append to output list. numOfEggsOut.append(numOfEggs) # Get the number of new fledgelings. numOfFledgelings = int(numOfAdultsPairs * params[averageFledgelingsPerPair]) # Append to output. numOfFledgelingsB4PredOut.append(numOfFledgelings)
# Apply fledgeling survival rate with an option to # apply predator control (or not). if params[predatorControl]: # With predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsCtrl]) else: # Without predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsNoCtrl] # Once the model has completed return the output variables for analysis. return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut, # A function to write the results to a text file # for analysis or visualisation within another # package.
157
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
def writeResultsFile(self, outputFile, paramStr, params, numOfAdultsPairsOut, numYearOldFledg # Open the output file for writing. outFile = open(outputFile, w) # Write the input parameters (the string formed # when we read the input parameters in. This is # useful as it will allow someone to understand # where these outputs came from outFile.write(paramStr) # Write a header indicating the following is # the model outputs. outFile.write("\n\n## Output Results.\n") # Create a string for each row of the output # file. Each row presents a parameter. yearStrs = "Year" numOfAdultsStrs = "NumberOfAdultsPairs" numOfYearOldFledgesStrs = "NumberOfYearOldFledgelings" numOfFledgesStrs = "NumberOfFledgelings" numOfFledgesB4PredStrs = "NumberOfFledgelingsB4Preds" numOfEggsStrs = "NumberOfEggs" # Loop through each year, building the output strings. for year in range(params[numOfYears]): yearStrs += "," + str(year) numOfAdultsStrs += "," + str(numOfAdultsPairsOut[year]) numOfYearOldFledgesStrs += "," + str(numYearOldFledgelingsOut[year]) numOfFledgesStrs += "," + str(numOfFledgelingsOut[year]) numOfFledgesB4PredStrs += "," + str(numOfFledgelingsB4PredOut[year]) numOfEggsStrs += "," + str(numOfEggsOut[year]) # Add a new line character to the end of each row. yearStrs += "\n" numOfAdultsStrs += "\n" numOfYearOldFledgesStrs += "\n" numOfFledgesStrs += "\n" numOfFledgesB4PredStrs += "\n" numOfEggsStrs += "\n" # Write the rows to the output file. outFile.write(yearStrs) outFile.write(numOfAdultsStrs) outFile.write(numOfFledgesStrs) outFile.write(numOfFledgesB4PredStrs)
158
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
# The run function controlling the overall order # of when things run. def run(self, inputFile): # Provide user feedback to the user. print("Parse Input File.") # Call the function to parse the input file. params, paramsStr = self.parseParameterFile(inputFile) # Print he parameters. print(params) # Provide some progress feedback to the user print("Run the model") # Run the model and get the output parameters. numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut, numOfFl # Provide some feedback to the user. print("Write the results to an output file") # Call the function to write the outputs # to a text file. self.writeResultsFile(cmdargs.outputFile, paramsStr, params, numOfAdultsPairsOut, numYear # This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit()
159
# Create instance of the model class obj = GoldenPloverPopModel() # Call the run function to execute the model. obj.run(args.input)
10.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Creating Plots
#!/usr/bin/env python # Import the system library import sys # Import the python Argument parser import argparse # Import the maths library import math as math # A function to test whether a module # is present. def module_exists(module_name): # Using a try block will # catch the exception thrown # if the module is not # available try: # Try to import module. __import__(module_name) # Catch the Import error. except ImportError: # Return false because # the module could not # be imported return False else: # The module was successfully # imported so return true. return True
160
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
161
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
162
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
# Get the number of pairs (assuming all adults # are paired. numOfAdultsPairs += (numOfFledgelingsYearOld/2) # Set the number of year old fledgelings numOfFledgelingsYearOld = numOfFledgelings # Get the number of adults and fledgelings following winter. # Based on their winter survival rate. numOfAdultsPairs=int(numOfAdultsPairs*params[winterSurvivalRate]) numOfFledgelingsYearOld=int(numOfFledgelingsYearOld*params[winterSurvivalRate]) # Get the numbers of eggs to hatch numOfEggs = int(numOfAdultsPairs * params[averageEggsPerPair]) # Append to output list. numOfEggsOut.append(numOfEggs) # Get the number of new fledgelings. numOfFledgelings = int(numOfAdultsPairs * params[averageFledgelingsPerPair]) # Append to output. numOfFledgelingsB4PredOut.append(numOfFledgelings)
# Apply fledgeling survival rate with an option to # apply predator control or not if params[predatorControl]: # With predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsCtrl]) else: # Without predator control numOfFledgelings=int(numOfFledgelings*params[fledgelingsSurvivePredatorsNoCtrl] # Once the model has completed return the output variables for analysis. return numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut,
# A function to write the results to a text file # for analysis or visualisation within another # package. def writeResultsFile(self, outputFile, paramStr, params, numOfAdultsPairsOut, numYearOldFledg # Open the output file for writing. outFile = open(outputFile, w) # Write the input parameters (the string formed
163
154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
# when we read the input parameters in. This is # useful as it will allow someone to understand # where these outputs came from outFile.write(paramStr) # Write a header indicating the following is # the model outputs. outFile.write("\n\n## Output Results.\n") # Create a string for each row of the output # file. Each row presents a parameter. yearStrs = "Year" numOfAdultsStrs = "NumberOfAdultsPairs" numOfYearOldFledgesStrs = "NumberOfYearOldFledgelings" numOfFledgesStrs = "NumberOfFledgelings" numOfFledgesB4PredStrs = "NumberOfFledgelingsB4Preds" numOfEggsStrs = "NumberOfEggs" # Loop through each year, building the output strings. for year in range(params[numOfYears]): yearStrs += "," + str(year) numOfAdultsStrs += "," + str(numOfAdultsPairsOut[year]) numOfYearOldFledgesStrs += "," + str(numYearOldFledgelingsOut[year]) numOfFledgesStrs += "," + str(numOfFledgelingsOut[year]) numOfFledgesB4PredStrs += "," + str(numOfFledgelingsB4PredOut[year]) numOfEggsStrs += "," + str(numOfEggsOut[year]) # Add a new line character to the end of each row. yearStrs += "\n" numOfAdultsStrs += "\n" numOfYearOldFledgesStrs += "\n" numOfFledgesStrs += "\n" numOfFledgesB4PredStrs += "\n" numOfEggsStrs += "\n" # Write the rows to the output file. outFile.write(yearStrs) outFile.write(numOfAdultsStrs) outFile.write(numOfFledgesStrs) outFile.write(numOfFledgesB4PredStrs) outFile.write(numOfFledgesStrs) outFile.write(numOfEggsStrs) # Close the output file.
164
195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235
def plots(self, outputFile, params, numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsO # Test that the matplotlib library is present # so plots can be created. if module_exists("matplotlib.pyplot"): # The matplotlib library exists so # import it for use in this function. # Importing a library within a function # like this means it is only # available within this function. import matplotlib.pyplot as plt # Get the number of years as a list # of years. years = range(params[numOfYears]) # Create a simple plot for the number of # pairs. fig1 = plt.figure(figsize=(15, 5), dpi=150) plt.plot(years, numOfAdultsPairsOut) plt.title("Number of pairs per year predicted by model") plt.xlabel("Year") plt.ylabel("Number Of Pairs") plt.savefig((outputFile+"_adultpairs.pdf"), format=PDF) # Create a simple plot for the number of # year old fledgelings fig2 = plt.figure(figsize=(15, 5), dpi=150) plt.plot(years, numYearOldFledgelingsOut) plt.title("Number of year olf fledgelings predicted by model") plt.xlabel("Year") plt.ylabel("Number Of Fledglings") plt.savefig((outputFile+"_numYearOldFledgelings.pdf"), format=PDF) # Create a simple plot for the number of # eggs hatched each year. fig3 = plt.figure(figsize=(15, 5), dpi=150) plt.plot(years, numOfEggsOut) plt.title("Number of eggs per year predicted by model") plt.xlabel("Year") plt.ylabel("Number Of Eggs")
165
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276
# Create a simple plot for the number of # fledgelings before that years breeding fig5 = plt.figure(figsize=(15, 5), dpi=150) plt.plot(years, numOfFledgelingsB4PredOut) plt.title("Number of fledgelings before breeding per year predicted by model") plt.xlabel("Year") plt.ylabel("Number Of Fledgelings") plt.savefig((outputFile+"_numOfFledgelingsB4Pred.pdf"), format=PDF) else: # If the matplotlib library is not available # print out a suitable error message. print("Matplotlib is not available and therefore the plots cannot be created.")
# The run function controlling the overall order # of when things run. def run(self, inputFile, outputFile, plotsPath): # Provide user feedback to the user. print("Parse Input File.") # Call the function to parse the input file. params, paramsStr = self.parseParameterFile(inputFile) # Print he parameters. print(params) # Provide some progress feedback to the user print("Run the model") # Run the model and get the output parameters. numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEggsOut, numOfFledgelingsOut, numOfFl # Provide some feedback to the user. print("Write the results to an output file") # Call the function to write the outputs
166
277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317
# to a text file. self.writeResultsFile(outputFile, paramsStr, params, numOfAdultsPairsOut, numYearOldFledg # Check whether a path has been provided # for the plots. If it has then generate # output plots. if plotsPath is not None: # Give the user feedback of whats happenign. print("Generating plots of the results") # Call the function to generate plots self.plots(plotsPath, params, numOfAdultsPairsOut, numYearOldFledgelingsOut, numOfEgg
# This is the first part of the script to # be executed. if __name__ == __main__: # Create the command line options # parser. parser = argparse.ArgumentParser() # Define the argument for specifying the input file. parser.add_argument("-i", "--input", type=str, help="Specify the input image file.") # Define the argument for specifying the output file. parser.add_argument("-o", "--output", type=str, help="Specify the output text file.") # Define the argument for specifying the output file. parser.add_argument("-p", "--plot", type=str, help="Specify the output base path for the plot # Call the parser to parse the arguments. args = parser.parse_args() # Check that the input parameter has been specified. if args.input == None: # Print an error message if not and exit. print("Error: No input image file provided.") sys.exit() # Check that the input parameter has been specified. if args.output == None: # Print an error message if not and exit. print("Error: No output text file provided.") sys.exit() # Create instance of the model class obj = GoldenPloverPopModel() # Call the run function to execute the model.
167
318
10.7 10.8
Python Documentation - http://www.python.org/doc Core Python Programming (Second Edition), W.J. Chun. Prentice Hall ISBN 0-13-226993-7
Appendix A RSGISLib
A.1
Introduction to RSGISLib
The remote sensing and GIS software library (RSGISLib) was developed at Aberystwyth University by Pete Bunting and Daniel Clewley. Development started in April 2008 and has been actively maintained and added to ever since. For more information see http://www.rsgislib.org.
A.2
Using RSGISLib
RSGISLib has a command line user interface where the main commands you will be using are: rsgisexe - the main command to execute scripts rsgislibxmllist - a command to list all the available commands within the library
168
169
rsgislibcmdxml.py - a command to allow script templates to be populated with le paths and names. rsgislibvarsxml.py - a command to input variable values into a template script.
A.2.1
XML Basics RSGISLib is parameterised through the use of an XML script. XML stands for Extensible Markup Language. Extensible - XML is extensible. It lets you dene your own tags, the order in which they occur, and how they should be processed or displayed. Another way to think about extensibility is to consider that XML allows all of us to extend our notion of what a document is: it can be a le that lives on a le server, or it can be a transient piece of data that ows between two computer systems. Markup - The most recognizable feature of XML is its tags, or elements (to be more accurate). Language - XML is a language thats very similar to HTML. Its much more exible than HTML because it allows you to create your own custom tags. However, its important to realize that XML is not just a language. XML is a meta-language: a language that allows us to create or dene other languages. For example, with XML we can create other languages, such as RSS,
170
<parent_element> <some_information> </some_information> <some_information name="some data" value="some other data" /> </parent_element>
XML is made up of opening and closing elements, where the hierarchy of the elements provides meaning and structure to the information stored. Therefore, every element has an opening and closing element. This can be dened in two ways; rstly with two tags, where the opening tag is just enclosed with angled brackets (< tag >) and the closing tag contains a backslash and angled brackets < /tag >. Using this method further tags for data can be stored between the two tags, providing structure as shown above. The second method uses just a single tag with an ending backslash (< tag/ >). This second method is used when no data or further tags are to be dened below current element.
1
<element></element> <element/>
Escape Characters As with all computing languages there are certain characters which have specic meanings and therefore an escape character needs to be used if these characters are required within the input. & - & - '
171
Commenting To add comments to XML code and temporally comment out parts of your XML script you need to use the XML commenting syntax as show below.
1 2 3 4 5 6 7
<!-- Some useful comment --> <parent_element> <some_information> </some_information> <!-- This is some really useful information in this comment --> <some_information name="some data" value="some other data" /> </parent_element>
All parts of the document between the opening and closing comment tags will be ignored by the parser.
RSGISLib XML For parameterisation of the rsgisexe application you will need to create an XML le in the correct format, which the RSGISLib executable understands, while adhering
APPENDIX A. RSGISLIB
172
to the rules of XML outlined above. The basis for the RSGISLib XML is to provide a list of commands. Therefore, the XML has the following structure:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
<?xml version="1.0" encoding="UTF-8" ?> <!-Description: XML File for execution within RSGISLib Created by **ME** on Wed Nov 28 15:53:41 2012. Copyright (c) 2012 **Organisation**. All rights reserved. -->
<rsgis:commands xmlns:rsgis="http://www.rsgislib.org/xml/">
<!-- ENTER YOUR XML HERE --> <rsgis:command algor="name" option="algor_option" attr1="foo" attr2="bar"> <rsgis:data attribute="blob" /> </rsgis:command> <rsgis:command algor="algor_name" option="algorithm_option" attr="data"/>
</rsgis:commands>
Where all the input parameters are dened using element attributes and each algorithm and option have their own set of attributes to be specied. Within the XML le imported into rsgisexe multiple command elements can be specied and they will all be executed in the order specied in the XML le. Therefore, a sequence of events can be specied and executed without any further interaction.
APPENDIX A. RSGISLIB
173
A.3
Segmentation
The segmentation algorithm (?) is based on generating spectrally similar units with a minimum object size. The algorithm consists of a number of steps 1. Select image bands and stack images 2. Stretch image data 3. Find unique cluster within feature space (KMeans) 4. Assign pixels to clusters 5. Clump the image 6. Eliminate small segments The KMeans clusters takes just a single image where all the bands are used as input so if multiple images are required to be inputted then they need to be stacked and the bands which are to be used selected. As a Euclidean distance is used within the feature space the image is stretched such that all the pixel values are within the same range (i.e., 0255). A clustering algorithm is then used to identify the unique colours within the image, in this case a KMeans clustering is used but other clustering algorithms could also be used instead. The image pixels are then assigned to the clusters (classifying the image) and the image clumped to nd the connected regions of the image. The nal step is an iterative elimination of the small segments, starting with the single pixels and going up to the maximum size of the segments specied by the user.
APPENDIX A. RSGISLIB Therefore, there are two key parameters within the algorithm: 1. the number of cluster centres identied by the KMeans clustering 2. the minimum size of the segments
174
A.3.1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
XML Code
<rsgis:command algor="imageutils" option="stretch" image="$FILEPATH" output="$PATH/$FILENAME_stretched.kea" ignorezeros="yes" stretch="LinearStdDev" stddev="2" format="KEA" />
<rsgis:command algor="imagecalc" option="bandmaths" output="$PATH/$FILENAME_mask.kea" format="KEA" expression="b1==0?0:1" > <rsgis:variable name="b1" image="$FILEPATH" band="1" /> </rsgis:command>
<rsgis:command algor="imageutils" option="mask" image="$PATH/$FILENAME_stretched.kea" mask="$PATH/$FILENAME_mask.kea" output="$PATH/$FILENAME_stretched_masked.kea" maskvalue="0" outputvalue="0" format="KEA" />
<rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_mask.kea" /> <rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_stretched.kea" />
APPENDIX A. RSGISLIB
175
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
<rsgis:command algor="segmentation" option="labelsfromclusters" image="$PATH/$FILENAME_stretched_masked.kea" output="$PATH/$FILENAME_clusters.kea" clusters="$PATH/$FILENAME_clusters.gmtxt" ignorezeros="yes" format="KEA" proj="IMAGE" />
<rsgis:command algor="segmentation" option="elimsinglepxls" image="$PATH/$FILENAME_stretched_masked.kea" clumps="$PATH/$FILENAME_clusters.kea" temp="$PATH/$FILENAME_clusters_singlepxls_tmp.kea" output="$PATH/$FILENAME_clusters_nosinglepxls.kea" ignorezeros="yes" format="KEA" proj="IMAGE" />
<rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_clusters.kea" /> <rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_clusters_singlepxls_tmp.kea" />
<rsgis:command algor="segmentation" option="clump" image="$PATH/$FILENAME_clusters_nosinglepxls.kea" output="$PATH/$FILENAME_clumps.kea" nodata="0" format="KEA" inmemory="no" proj="IMAGE" />
APPENDIX A. RSGISLIB
output="$PATH/$FILENAME_clumps_elim.kea" minsize="50" maxspectraldist="200000" format="KEA" inmemory="no" proj="IMAGE" />
176
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
<rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_stretched_masked.kea" /> <rsgis:command algor="commandline" option="execute" command="rm $PATH/$FILENAME_clumps.kea" />
<rsgis:command algor="segmentation" option="meanimg" image="$FILEPATH" clumps="$PATH/$FILENAME_clumps_elim_final.kea" output="$PATH/$FILENAME_clumps_elim_mean.kea" format="KEA" inmemory="no" proj="IMAGE" />
To use the script provided you need to use the rsgislibxml.py command which replaces the $FILEPATH with the le path of the input image (found by rsgislibxml.py within the input directory) $PATH with the provided directory path and $FILENAME with the name of the input le. An example of this command is given below:
APPENDIX A. RSGISLIB
rsgislibxml.py -i RunSegmentationTemplate.xml \ -o Segmentation.xml -p ./Segments \ -d ./Data/ -e .kea -r no -t single
177
Once the command above has been executed then the segmentation can be run using the rsgisexe command:
rsgisexe -x Segmentation.xml
The resulting segmentation will have produced 3 output les 1. clusters.gmtxt Cluster centres. 2. clumps elim nal.kea Segment clumps. 3. clumps elim mean.kea Mean colour image using segments. Following the segmentation the it is recommend that you make sure that the clumps le is dened as a thematic le, as demonstrated in the following piece of python:
1 2 3 4 5 6 7 8 9
#!/usr/bin/env python
Finally, use the gdalcalcstats command to populate the image with an attribute table, histogram and colour table (set -ignore 0 as 0 is the background no data value).
APPENDIX A. RSGISLIB
setthematic.py L7ETM_530N035W_clumps_elim_final.kea gdalcalcstats L7ETM_530N035W_clumps_elim_final.kea -ignore 0
178
A.4
Populating Segments
To populate the segments with statistics (i.e., Mean for each spectral band) there is a command with the rastergis part of the RSGISLib software. Examples of this are shown within the XML code below, note the text given for each band is the names of the output columns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
<?xml version="1.0" encoding="UTF-8" ?> <!-Description: XML File for execution within RSGISLib Created by **ME** on Thu Mar 21 09:25:21 2013. Copyright (c) 2013 **Organisation**. All rights reserved. -->
<rsgis:commands xmlns:rsgis="http://www.rsgislib.org/xml/">
<rsgis:command algor="rastergis" option="popattributestats" clumps="L7ETM_530N035W_Classification.kea" input="L7ETM_530N035W_20100417_AtCor_osgb_masked.kea" > <rsgis:band band="1" mean="MayBlue" stddev="MaySDBlue" /> <rsgis:band band="2" mean="MayGreen" stddev="MaySDGreen" /> <rsgis:band band="3" mean="MayRed" stddev="MaySDRed" /> <rsgis:band band="4" mean="MayNIR" stddev="MaySDNIR" /> <rsgis:band band="5" mean="MaySWIR1" stddev="MaySDSWIR1" /> <rsgis:band band="6" mean="MaySWIR2" stddev="MaySDSWIR2" /> </rsgis:command>
APPENDIX A. RSGISLIB
179
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
<rsgis:command algor="rastergis" option="popattributestats" clumps="L7ETM_530N035W_Classification.kea" input="L7ETM_530N035W_20100620_AtCor_osgb_masked.kea" > <rsgis:band band="1" mean="JuneBlue" stddev="JuneSDBlue" /> <rsgis:band band="2" mean="JuneGreen" stddev="JuneSDGreen" /> <rsgis:band band="3" mean="JuneRed" stddev="JuneSDRed" /> <rsgis:band band="4" mean="JuneNIR" stddev="JuneSDNIR" /> <rsgis:band band="5" mean="JuneSWIR1" stddev="JuneSDSWIR1" /> <rsgis:band band="6" mean="JuneSWIR2" stddev="JuneSDSWIR2" /> </rsgis:command>
<rsgis:command algor="rastergis" option="popattributestats" clumps="L7ETM_530N035W_Classification.kea" input="Nant_y_Arian_DEM_30m.kea" > <rsgis:band band="1" min="MinDEM" mean="MaxDEM" mean="MeanDEM" stddev="StdDevDEM" /> </rsgis:command>
</rsgis:commands>
If you are going to use a indices and other derived information within your classication it is quite often a good idea to set up a python script to calculate those indices and write them back to the image rather than over complicating your classication script. An example of this is shown below.
1 2 3 4 5
#!/usr/bin/env python
APPENDIX A. RSGISLIB
import osgeo.gdal as gdal
180
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
print("Import Columns.") MayBlue = rat.readColumn(ratDataset, "MayBlue") MayGreen = rat.readColumn(ratDataset, "MayGreen") MayRed = rat.readColumn(ratDataset, "MayRed") MayNIR = rat.readColumn(ratDataset, "MayNIR") MaySWIR1 = rat.readColumn(ratDataset, "MaySWIR1") MaySWIR2 = rat.readColumn(ratDataset, "MaySWIR2")
JuneBlue = rat.readColumn(ratDataset, "JuneBlue") JuneGreen = rat.readColumn(ratDataset, "JuneGreen") JuneRed = rat.readColumn(ratDataset, "JuneRed") JuneNIR = rat.readColumn(ratDataset, "JuneNIR") JuneSWIR1 = rat.readColumn(ratDataset, "JuneSWIR1") JuneSWIR2 = rat.readColumn(ratDataset, "JuneSWIR2")
APPENDIX A. RSGISLIB
print("Calculate Indices.") MayNDVI = (MayNIR - MayRed) / (MayNIR + MayRed) JuneNDVI = (JuneNIR - JuneRed) / (JuneNIR + JuneRed)
181
37 38 39 40 41 42 43 44 45 46 47
rat.writeColumn(ratDataset, "MayNDVI", MayNDVI) rat.writeColumn(ratDataset, "JuneNDVI", JuneNDVI) rat.writeColumn(ratDataset, "MayWBI", MayWBI) rat.writeColumn(ratDataset, "JuneWBI", JuneWBI)