This section will present results and discussion of efficiency evaluations with FiPy. Programming in Python allows greater efficiency when designing and implementing new code, but it has some intrinsic inefficiencies during execution as compared with the C or FORTRAN programming languages. These inefficiencies can be minimized by translating sections of code that are used frequently into C.
FiPy has been tested against an in-house phase field code, written at NIST, to model grain growth and subsequent impingement. This problem can be executed by running:
$ examples/phase/impingement/mesh20x20.py \ > --numberOfElements=10000 --numberOfSteps=1000
The raw CPU execution times for 10 time steps are presented in the
following table. The run times are in seconds and the memory usage is
in kilobytes. The Kobayashi code is given the heading of FORTRAN while
FiPy is run with and without inlining. The memory usage is for
FiPy simulations with the
--no-cache flag is
on in all cases for the following table.
||FORTRAN (s)||FiPy memory (KiB)||FORTRAN memory (KiB)|
The plain Python version of FiPy, which uses
Numeric for all
array operations, is around 17 times slower than the FORTRAN
code. Using the
--inline flag, this penalty is reduced to about 9
It is hoped that in future releases of FiPy the process of C
Variable objects will be automated. This may result
in some efficiency gains, greater than we are seeing for this
particular problem since all the
Variable objects will be
inlined. Recent analysis has shown that a
Variable with multiple
operations could be up to 6 times faster at calculating its value when
As presented in the above table, memory usage was also recorded for each FiPy simulation. From the table, once base memory usage is subtracted, each cell requires approximately 1.4 kilobytes of memory. The measurement of the maximum memory spike is hard with dynamic memory allocation, so these figures should only be used as a very rough guide. The FORTRAN memory usage is exact since memory is not allocated dynamically.
Efficiency comparison between
This table shows results for efficiency tests when using the caching
flags. Examples with more variables involved in complex expressions
show the largest improvement in memory usage. The
option mainly prevents intermediate variables created due to binary
operations from caching their values. This results in large memory
gains while not effecting run times substantially. The table below is
--inline switched on and with 102400 elements for each case.
--no-cache flag is the default option.
|Example||time per step
||time per step
||memory per cell
||memory per cell
Efficiency discussion of Pysparse and Trilinos¶
Trilinos provides multigrid capabilities which are beneficial for some problems, but has significant overhead compared to PySparse. The matrix-building step takes significantly longer in Trilinos, and the solvers also have more overhead costs in memory and performance than the equivalent Pysparse solvers. However, the multigrid preconditioning capabilities of Trilinos can, in some cases, provide enough of a speedup in the solution step to make up for the overhead costs. This depends greatly on the specifics of the problem, but is most likely in the cases when the problem is larrge and when Pysparse cannot solve the problem with an iterative solver and must use an LU solver, while Trilinos can still have success with an iterative method.
Enter search terms or a module, class or function name.