0% found this document useful (0 votes)
62 views

Advance Python Programming

advance python programming

Uploaded by

yellow_bird
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Advance Python Programming

advance python programming

Uploaded by

yellow_bird
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Advanced

 Python  for  
Scien2fic  Compu2ng  
Michael  Milligan  
[email protected]  

Follow  along!  
h=ps://www.msi.umn.edu/content/programming  
Files  in:  /home/support/public/tutorials/PythonSciComp/  
To  get  the  most  out  of  this…  
•  Basic  knowledge  of  Python  
•  Working  Python  install  (feel  free  to  use  ours!)  
–  Enthought  Python  DistribuCon  /  Canopy  provides  
scienCfic  and  math  libraries  pre-­‐installed  

•  MSI  login  +  SSH  or  NX  


•  Follow  along!  
Why  Python    
for  ScienCfic  CompuCng?  
•  Rapid  development  
–  Easy,  readable  syntax  
–  VersaCle  tools  for  experimentaCon/learning  
–  Comprehensive  libraries  
•  Powerful  Features  
–  Process  data  at  near  “naCve  code”  speeds  
–  Excellent  visualizaCon  packages  
–  Comprehensive  libraries  
When  you  leave  today,  you  should  be  able  to…  

•  Program  interacCvely  with  ipython  


•  understand  the  basics  of  numpy  and  scipy  
•  Efficiently  compute  with  large  arrays  of  data  
•  Load  and  save  data  to/from  files  on  disk  
•  Use  matplotlib  to  plot  data  
•  Take  advantage  of  supercompuCng  resources  with  
parallel  compu2ng  
•  Know  where  to  turn  for  more  help  with  these  topics  
Details  
•  We  are  describing  Enthought  Python  DistribuCon.  
–  EssenCally:  Pre-­‐assembled  compilaCon  of  Python  2.7  +  
numpy,  scipy,  other  useful  libraries  
–  Free  for  academic  use,  a  basic  version  is  free  for  non-­‐
commercial  use  
–  Your  computers,  departments,  etc  may  have  a  different  
version  of  Python  installed.  Everything  we  will  see  
today  is  open  source.  

•  In  MSI:    module  load  python-­‐epd  


Workshop  ConvenCons  
•  UNIX  shell  commands  are  
indicated  with  the  percent  
sign.      
•  IPython  interpreter  
commands  have  In/Out  
labels  
•  Neither  sign  indicates  
python  code  that  should  
be  entered  into  a  text  file.  
IPython:  InteracCve  Python  
•  Powerful  environment  
for  interacCve  work  
•  Run  as  “ipython”  from  
any  terminal  
•  -­‐-­‐pylab  opCon  auto-­‐
loads  numpy,  sets  up  
graphics  for  ploang  
•  Inspect  any  object  with  
“?”  or  help()  
IPython:  InteracCve  Python  
•  Build  up  a  workspace  of  
objects  and  funcCons  
•  Full  history  access  
through  Out[],  %recall,  
up/down  arrow  keys  
•  %load,  %edit,  or  %run  
external  files  
•  Lots  more,  type  %magic  
NumPy  and  SciPy  
•  NumPy  provides:  
•  the  basic  array  and  matrix  data  types  
•  Efficient  implementaCons  of  low-­‐level  math  operaCons  
•  A  large  library  of  high-­‐level  math  funcCons  built  from  
efficient  primiCves  
•  SciPy  provides:  
•  A  home  for  a  wide  variety  of  open-­‐source  mathemaCcal  
and  scienCfic  algorithms  
•  Modules  for  opCmizaCon,  signal  processing,  linear  
algebra,  staCsCcs,  interpolaCon,  and  more  
NumPy  arrays  
•  Array  data  type    with  
vectorized  operaCons(similar  
to  Matlab  or  IDL)  
•  Supports  same  operaCons  as  
Python  list  type  
•  …except  every  element  is  of  
same  data  type  
•  …so  they  can  be  stored  in  
memory  packed  like  C  arrays  
NumPy  arrays  are  fast  

Here we are comparing a “pure Python” loop to the


equivalent in numpy
MulCdimensional  arrays  
•  NumPy  arrays  are  
rectangles  in  
arbitrarily  many  
dimensions  
•  +  -­‐  *  /  operate  
element-­‐by-­‐
element  for  
same-­‐shape  
arrays  
Array  slicing  
•  Index  notaCon  gives  
access  to  any  “slice”  of  
an  array  
•  Array  slices  can  be  
assigned  –  this  changes  
the  original  array  
•  X  =  M[1,:,:].copy()  
would  avoid  changing  M  
Other  common  methods  
•  Numpy  arrays  
have  many  
useful  built-­‐in  
methods  
Other  common  methods  
•  …and  the  
numpy  module  
provides  more  
CondiCons  and  tests  
•  Vectorized  
logical  
operators  +  
indexing  
funcCons  
•  Output  of  index  
funcCons  can  
be  used  to  slice  
arrays  
CondiCons  and  tests  
•  Vectorized  
logical  
operators  +  
indexing  
funcCons  
•  Output  of  index  
funcCons  can  
be  used  to  slice  
arrays  
More  useful  numpy  modules  
•  numpy.fft  –  FFTs,  forward/inverse,  1-­‐D  and  N-­‐D  
•  numpy.random  –  generate  random  numbers,  many  
distribuCons  to  choose  from  
•  numpy.matrix  –  special  arrays  that  obey  matrix  math  
•  numpy.polynomial  –  module  for  represenCng  and  
manipulaCng  arbitrary  polynomials  
Ploang  made  easy  
•  Matplotlib  provides  high-­‐quality  2-­‐D  (and  
some  3-­‐D)  ploang  
•  Display  in  window  or  output  to  PDF,  SVG,  PNG,  etc  
•  Implemented  as  modular  object-­‐oriented  system  
•  Pylab  provides  a  Matlab-­‐ish  interacCve  
interface  to  Matplotlib  
•  Access  with  ipython  -­‐-­‐pylab  
•  Defaults  to  popping  up  plots  in  a  separate  window  
Some  basic  examples…  
Some  advanced  examples…  
•  These  examples  are  from  the  matplotlib.org  
examples  secCon…  
Some  advanced  examples…  
Some  advanced  examples…  
Some  advanced  examples…  
Some  advanced  examples…  
SciPy  expands  the  menu  
•  Clustering  algorithms  (scipy.cluster)  
•  IntegraCon  and  ODEs  (scipy.integrate)  
•  InterpolaCon  (scipy.interpolate)  
•  Input  and  output  (scipy.io)  
•  Linear  algebra  (scipy.linalg)  
•  MulC-­‐dimensional  image  processing  (scipy.ndimage)  
•  OpCmizaCon  and  root  finding  (scipy.opCmize)  
•  Signal  processing  (scipy.signal)  
•  Sparse  matrices  (scipy.sparse)  
•  SpaCal  algorithms  and  data  structures  (scipy.spaCal)  
•  Special  funcCons  (scipy.special)  
•  StaCsCcal  funcCons  (scipy.stats)  
•  And  then  some…  
SciPy  is  also  fast  
•  Most  SciPy  rouCnes  use  fast  NumPy  low-­‐level  
math  operaCons  
•  Some  SciPy  rouCnes  use  highly  opCmized  
external  libraries  
–  E.g.  scipy.linalg  links  to  BLAS,  LAPACK  or  MKL  
behind  the  scenes  
Data  on  disk  
•  Chances  are  you  want  to  load  and  save  data  
•  numpy  and  scipy.io  offer  a  variety  of  faciliCes  
Data  on  disk:  text  files  
•  Very  common  for  smaller  data  sets:  
simple  columns  of  numbers  
•  numpy.loadtxt()  –  simple  interface,  good  defaults  
•  numpy.genfromtxt()  –  more  complex,  handles  
unusual  formaang,  comments,  missing  values,  etc  
Data  on  disk:  text  files  
•  Numpy.savetxt()  –  write  to  columns  of  numbers  
Data  on  disk:  binary  formats  
•  Binary  data  is  much  more  scalable  
•  Smaller  files  on  disk  
•  Faster  to  load  and  save  
•  May  be  necessary  to  exchange  data  with  other  sopware  
•  SCck  to  portable  (machine-­‐independent)  formats  
Data  on  disk:  binary  formats  
•  NumPy  na2ve  format  (.npy)  
•  numpy.load()  and  numpy.save()  
•  Or  use  numpy.savez()  to  store  many  arrays  in  compressed  .npz  
•  Fast,  portable,  but  mostly  only  supported  by  Python  

•   scipy.io.matlab  –  support  for  Matlab  (.mat)  


•   scipy.io.loadmat()  and  scipy.io.savemat()  
•   scipy.io.idl  –  read  (no  save)  IDL  .sav  files  
•   scipy.io.readsav()  
Data  on  disk:  binary  formats  
•  Many  standard  formats  supported  
•  scipy.io.netcdf  –  NetCDF3  interface  
•  h5py  exposes  HDF5  API  
•  PyTables  is  an  excellent  high-­‐level  interface  to  HDF5  
•  pyfits  for  FITS  datasets  
•  Etc…  
Scaling  up  with  parallelizaCon  
•  For  big  jobs  you  will  eventually  want  to  
parallelize  your  code  
•  The  Python  interpreter  has  trouble  with  
mulCthreading  –  mulC-­‐process  is  usually  best  
•  Approach  depends  on  the  problem  you  need  
to  solve  
Parallel  processes  
•  Many  jobs  need  to  process  lots  of  data,  don’t  
need  to  communicate  amongst  themselves  
•  SomeCmes  called  “embarrassingly  parallel”  
•  GNU  Parallel  -­‐-­‐  a  simple  way  to  launch  jobs  
•  Launch  one  job  for  every  file  in  a  dir,  line  in  a  file,  etc  
•  Can  work  with  PBS  on  itasca  to  use  many  nodes  
GNU  Parallel  example  
GNU  Parallel  example  
•  -­‐j  should  match  ppn  (unless  you  know  what  
you’re  doing)  –  this  is  processes  per  node  
•  Will  run  one  job  per  line  of  input  on  stdin  or  in  
argfile  –  max  of  nodes  *  ppn  running  at  once  
•  See  “man  parallel”  for  more  features  
MPI  for  Python  
•  MPI  “Message  Passing  Interface”  enables  parallel  
processes  to  communicate  efficiently  
•  Commonly  one  process  will  be  “controller”  and  
manage  worker  processes  
•  Inherent  support  for  scaser-­‐gather  operaCons  
•  MPI  is  well-­‐supported  on  our  clusters  
•  mpi4py  interfaces  to  MPI  from  inside  Python  
•  Caveat  for  MPI  gurus:  numpy  does  not  have  
distributed  arrays  yet,  complicates  some  algorithms  
Example  with  mpi4py  
•  Simple  “Hello  world”  script  
Example  with  mpi4py  
•  Simple  “Hello  world”  script  
More  with  mpi4py  
•  Possible  to  pass  numpy  arrays  like  buffers  
More  with  mpi4py  
•  Also  works  with  (pickle-­‐able)  Python  objects  
•  Much  slower  than  C-­‐based  arrays,  but  very  convenient  
Too  much  to  cover…  
•  Ipython  notebook  –  connect  to  ipython  with  a  
browser  for  a  MathemaCca-­‐like  notebook  
interface  
•  PyCUDA  and  PyOpenCL  –  GPU  compuCng  
•  SymPy  –  MathemaCca-­‐style  symbolic  math  
•  Databases  are  easy  to  connect  to  Python;  or  
use  advanced  big  data  toolkit  like  Pandas  or  
PyTables  
IPython  notebook  example  
•  Example:  IPython  
notebook  with  pylab  and  
sympy  
•  notebook  creates  
graphical  log  in  a  browser  
•  sympy:  symbolic  CAS  
•  To  try  this:  
Community  and  DocumentaCon  
•  AcCvely  developed  and  supported  
•  Excellent  documentaCon  
•  www.python.org/doc  
•  Scipy.org  
•  wiki.scipy.org  
•  Ipython.org  
•  matplotlib.org  
•  Mpi4py.scipy.org  
Next  Step  
•  Hands-­‐on  
•  You  can  also  run  the  examples  on  your  
laptop’s  Python  distribuCon  
•  Enthought  is  installed  in  all  labs  and  on  
supercomputers  at  MSI  
•  Full  academic  version  of  Enthought  Canopy  
installed  (not  default  yet)  
•  QuesCons!  

You might also like