This section will present results and discussion of efficiency evaluations with FiPy. Programming in Python allows greater efficiency when designing and implementing new code, but it has some intrinsic inefficiencies during execution as compared with the C or FORTRAN programming languages. These inefficiencies can be minimized by translating sections of code that are used frequently into C.
FiPy has been tested against an in-house phase field code, written at NIST, to model grain growth and subsequent impingement. This problem can be executed by running:
$ examples/phase/impingement/mesh20x20.py \ > --numberOfElements=10000 --numberOfSteps=1000
The raw CPU execution times for 10 time steps are presented in the following table. The run times are in seconds and the memory usage is in kilobytes. The Kobayashi code is given the heading of FORTRAN while FiPy is run with and without inlining. The memory usage is for FiPy simulations with the --inline. The --no-cache flag is on in all cases for the following table.
|Elements||FiPy (s)||FiPy --inline (s)||FORTRAN (s)||FiPy memory (KiB)||FORTRAN memory (KiB)|
The plain Python version of FiPy, which uses Numeric for all array operations, is around 17 times slower than the FORTRAN code. Using the --inline flag, this penalty is reduced to about 9 times slower.
It is hoped that in future releases of FiPy the process of C inlining for Variable objects will be automated. This may result in some efficiency gains, greater than we are seeing for this particular problem since all the Variable objects will be inlined. Recent analysis has shown that a Variable with multiple operations could be up to 6 times faster at calculating its value when inlined.
As presented in the above table, memory usage was also recorded for each FiPy simulation. From the table, once base memory usage is subtracted, each cell requires approximately 1.4 kilobytes of memory. The measurement of the maximum memory spike is hard with dynamic memory allocation, so these figures should only be used as a very rough guide. The FORTRAN memory usage is exact since memory is not allocated dynamically.
This table shows results for efficiency tests when using the caching flags. Examples with more variables involved in complex expressions show the largest improvement in memory usage. The --no-cache option mainly prevents intermediate variables created due to binary operations from caching their values. This results in large memory gains while not effecting run times substantially. The table below is with --inline switched on and with 102400 elements for each case. The --no-cache flag is the default option.
|Example||time per step --no-cache (s)||time per step --cache (s)||memory per cell --no-cache (KiB)||memory per cell --cache (KiB)|
Trilinos provides multigrid capabilities which are beneficial for some problems, but has significant overhead compared to PySparse. The matrix-building step takes significantly longer in Trilinos, and the solvers also have more overhead costs in memory and performance than the equivalent Pysparse solvers. However, the multigrid preconditioning capabilities of Trilinos can, in some cases, provide enough of a speedup in the solution step to make up for the overhead costs. This depends greatly on the specifics of the problem, but is most likely in the cases when the problem is larrge and when Pysparse cannot solve the problem with an iterative solver and must use an LU solver, while Trilinos can still have success with an iterative method.