The motivation for high-performance computing in reservoir simulation has always existed. From the earliest simulation models, computing resources have been severely taxed simply because the level of complexity desired by the engineer almost always exceeded the speed and memory of the hardware. The high-speed vector processors such as the Cray of the late 1970s and early 1980s led to orders-of-magnitude improvement in speed of computation and led to production models of several hundred thousand cells. The relief brought by these models, unfortunately, was short-lived. The desire for increased physics of compositional modeling and the introduction of geostatistically/structurally based geological models led to increases in computational complexity even beyond the large-scale models of the vector processors.
Algebraic Multigrid (AMG) methods have proven to be efficient when numerically solving elliptic Partial Differential Equations (PDE). In reservoir simulation, AMG is used together with the Constrained Pressure Residual (CPR) method to solve a partially decoupled pressure system. Recently, effort has been focused on improving the robustness of the AMG-CPR solver. This paper presents the performance of different AMG-CPR strategies for massive reservoir models. In addition a solver selection analysis is conducted proving that dynamic selection of solvers has potential of increasing the overall efficiency and robustness of the simulation.
Numerous decoupling/preconditioning algorithms exist and have been shown to influence the pressure matrix properties, some resulting in matrices more suitable to the characteristics favorable to AMG. Several decoupling/preconditioning strategies are investigated such as Alternate Block Factorization (ABF), Quasi-IMPES (QI), and Dynamic Rowsum (DRS). The extracted pressure matrix could be suitable or unsuitable for AMG, depending on the matrix row sum, the diagonal signs, and the signs of the off-diagonal values.
The advantage of using AMG as a preconditioner is demonstrated by running the SPE10 case. The recommended AMG settings that result in the optimal performance for SPE10 are shared. A speedup is seen of up to 4X when using AMG with optimal settings versus the default solver in the in-house reservoir simulator with the improvement range depending on the number of processors used. SPE10 is a highly heterogeneous model resulting in matrices favorable for AMG, i.e. pressure decoupling produces positive definite pressure matrices which is not necessarily representative of industry models. A comparison is then made with a selection of models with a wide range of characteristics and finally an examination of the convergence behavior of key industry cases with different decoupling strategies is presented. The overall convergence behavior of the pressure and full system are shown and the top decoupling algorithms for the particular models are discussed. Finally, the applicability and performance gain of selectively using AMG during a run is demonstrated.
Recent developments have been made in regard to AMG methods; but their applicability in a wide range of massive real cases is yet to be explored. In this work, different decoupling methods are tested, the AMG behavior on real field massive models is analyzed, the scalability is investigated, and AMG is selectively activated during a simulation run shedding light on the potential of future work entailing the use of Artificial Intelligence (AI) to dynamically select the optimal solver choice.
High-resolution discretizations can be advantageous in compositional simulation to reduce excessive numerical diffusion that tends to mask shocks and fingering effects. In this work, we outline a fully implicit, dynamic, multilevel, high-resolution simulator for compositional problems on unstructured polyhedral grids. We rely on four ingredients: (i) sequential splitting of the full problem into a pressure and a transport problem, (ii) ordering of grid cells based on intercell fluxes to localize the nonlinear transport solves, (iii) higher-order discontinuous Galerkin (dG) spatial discretization with order adaptivity for the component transport, and (iv) a dynamic coarsening and refinement procedure. For purely cocurrent flow, and in the absence of capillary forces, the nonlinear transport system can be perturbed to a lower block-triangular form. With counter-current flow caused by gravity or capillary forces, the nonlinear system of discrete transport equations will contain larger blocks of mutually dependent cells on the diagonal. In either case, the transport subproblem can be solved efficiently cell-by-cell or block-by-block because of the natural localization in the dG scheme. In addition, we discuss how adaptive grid and order refinement can effectively improve accuracy. We demonstrate the applicability of the proposed solver through a number of examples, ranging from simple conceptual problems with PEBI grids in two dimensions, to realistic reservoir models in three dimensions. We compare our new solver to the standard upstream-mobility-weighting scheme and to a second-order WENO scheme.
Accurate and robust well modeling is essential for performing reservoir simulations of practical interest. The Multi-Segment well (MSWell) model is able to describe the well topology and accurately represent the multiphase multicomponent flow and transport behavior in the wellbore. The fully coupled method (FC) has been developed and widely applied on coupled reservoir and MSWell modeling due to its unconditional stability and consistent implementation. A local well solver can be applied to provide a better nonlinear precondition for MSWell variables in order to accelerate the nonlinear convergence of the FC method.
However, solving the coupled MSWell and reservoir model in a fully implicit scheme can still present limitations on some practical applications. First, the well or surface facility solver can be separate from the existing reservoir simulator, making it challenging to employ the fully implicit method. Second, complex linear and nonlinear solvers need to be designed to pair the specific wells and reservoir models. These solvers have to account for the different flow characteristics and discretization domains between reservoir and MSWell. A sequential coupling scheme can become preferable in such situations.
Sequential fully Implicit method (SFI) splits the fully coupled reservoir and MSWell equations into two parts and solves them sequentially. In spite of accomplishing an implicit coupling in a sequential scheme, SFI suffers the slow outer loop convergence rate especially when reservoir is strongly coupled with the wells, which is very often the case. The slow convergence is caused by the linear convergence rate of the fix point iteration used in the SFI. Here, we developed a sequential implicit Newton's method (SIN) for coupled MSWells. SIN incorporates a Newton update at the end of each sequential step to achieve a quadratic convergence of outer iterations, while require a limited extra computational cost. Numerical results show that SIN attains comparable nonlinear Newton iterations with the FC in the coupled heterogeneous reservoir and complex MSWell problems.
Thiele, Christopher (Rice University) | Araya-Polo, Mauricio (Shell International Exploration & Production, Inc.) | Alpak, Faruk Omer (Shell International Exploration & Production, Inc.) | Riviere, Beatrice (Rice University)
Direct numerical simulation of multiphase pore-scale flow is a computationally demanding task with strong requirements on time-to-solution for the prediction of relative permeabilities. In this paper, we describe the hybrid-parallel implementation of a two-phase two-component incompressible flow simulator using MPI, OpenMP, and general-purpose graphics processing units (GPUs), and we analyze its computational performance. In particular, we evaluate the parallel performance of GPU-based iterative linear solvers for this application, and we compare them to CPUbased implementations of the same solver algorithms. Simulations on real-life Berea sandstone micro-CT images are used to assess the strong scalability and computational performance of the different solver implementations and their effect on time-to-solution. Additionally, we use a Poisson problem to further characterize achievable strong and weak scalability of the GPU-based solvers in reproducible experiments. Our experiments show that GPU-based iterative solvers can greatly reduce time-to-solution in complex pore-scale simulations. On the other hand, strong scalability is currently limited by the unbalanced computing capacities of the host and the GPUs. The experiments with the Poisson problem indicate that GPU-based iterative solvers are efficient when weak scalability is desired. Our findings show that proper utilization of GPUs can help to make our two-phase pore-scale flow simulation computationally feasible in existing workflows.
Traditionally, preconditioners are used to damp slowly varying error modes in the linear solver stage. State-of-the-art multilevel preconditioners use a sequence of aggressive restriction, coarse-grid correction and prolongation operators to handle low-frequency modes on the coarse grid. High-frequency errors are then resolved by employing a smoother on fine grid. In this paper, the algebraic smoothing aggregation two level preconditioner is implemented to solve different coupled problems.
The proposed method generalizes the existing MsRSB and smoothing aggregation AMG methods. This method does not require any coarse partitioning and, hence, can be applied to general unstructured topology of the fine scale. Inspired by smoothing aggregation algebraic multigrid solver, the algebraic smoothing aggregation preconditioner constructs basis functions which allow mapping of some high-frequency modes from fine scale to low-frequency modes on the coarse scale. These basis functions are also used to reconstruct unknown primary variables at the fine scale using their approximations at the coarse level.
The proposed preconditioner has been adopted to challenging multiphysical problems, including fully coupled simulation of filtration and geomechanics processes including non-isothermal fluid flow problems. The preconditioner provides a reasonably good approximation to the coupled physical processes and speeds up the convergence. Compared to traditional ILU0+GMRES linear solvers, our preconditioner with GMRES solver reduces the number of iterations by about 3 times. In addition, the proposed method obeys a good theoretical scalability essential for parallel simulations.
In this work, the scalability of the Algebraic Multiscale Solver (AMS) (Wang et al. 2014) for the pressure equation arising from incompressible flow in heterogeneous porous media is investigated on the GPU massively parallel architecture. The solvers robustness and scalability is compared against its carefully optimized implementation on the shared-memory multi-core architecture (Manea et al. 2016), which this work is directly extending. Although several components in the AMS algorithm are directly parallelizable, its scalability on GPU's depends heavily on the underlying algorithmic details and data-structures design of each step, where one needs to ensure favorable control-and data-flow on the GPU, while extracting enough parallel work for a massively parallel environment. In addition, the type of the algorithm chosen for each step greatly influences the overall robustness of the solver. Taking all these constraints into account, we have developed a GPU-based AMS that exploits the parallelism in every module of the solver, including both the setup phase and the solution phase. The performance of AMS--with our carefully optimized algorithmic choices on the GPU for both the setup phase and the solution phase, is demonstrated using highly heterogeneous 3D problems derived from the SPE10 Benchmark (Christie et al. 2001). Those problems range in size from millions to tens of millions of cells. The GPU implementation is benchmarked on a massively parallel architecture consisting of NVIDIA Kepler K80 GPU's, where its performance is compared to the multi-core CPU architecture using an optimized multi-core AMS implementation (Manea et al. 2016) running on a shared memory multi-core architecture consisting of two packages of Intel's Haswell-EP Xeon(R) CPU E5-2667. While the GPU-based AMS parallel implementation shows good scalability for the solution stage, its setup stage is less efficient compared to the CPU, mainly due to the dependence on a QR-based basis functions solver.
Gries, Sebastian (Fraunhofer Institute SCAI) | Metsch, Bram (Fraunhofer Institute SCAI) | Terekhov, Kirill M. (Energy Resources Engineering, Stanford University) | Tomin, Pavel (Energy Resources Engineering, Stanford University)
The consideration of geomechanical effects is becoming more and more important in reservoir simulations. Ensuring stable simulation processes often enough requires handling the entire process with all types of physical unknowns fully implicitly. However, the resulting fully coupled linear systems pose challenges for linear solvers. The number of approaches that can efficiently handle a fully coupled system is extremely limited.
System-AMG has demonstrated its efficiency for isothermal and thermal reservoir simulations. At the same time, AMG is known to be a robust and highly efficient linear solver for mere linear elasticity problems. This paper will discuss the combination of the advantages that AMG approaches have for both types of physics. This results in a robust and efficient solution scheme for the fully coupled linear system. The Automatic Differentiation General Purpose Research Simulator (
In a single-phase case, the overall Jacobian matrix takes the form of a constrained linear elasticity system where the flow unknowns serve as a Lagrangian multiplier. In other words, a saddle point system needs to be solved, where the flow and the mechanics problem might come at very different scales. A natural relaxation method for this kind of systems is given by Uzawa smoothing schemes which provide a way to overcome the difficulties that other smoothers may encounter.
This approach appears intuitive for single-phase problems, where Gauss-Seidel can be applied in an inexact Uzawa scheme. However, in the multiphase case, incomplete factorization smoothers are required for the flow and transport part. We will discuss the incorporation in an inexact Uzawa scheme, where different realizations are possible, with different advantages and disadvantages. Finally, we propose an adaptive mechanism along with the outer Krylov solver to detect the best-suited realization for a given linear system. In the multiphase case, also the matrix preprocessing, for instance, by Dynamic Row Summing, needs to be considered. However, the process now also needs to reflect the requirements of the Uzawa scheme to be applicable.
We demonstrate the performance for widely used test cases as well as for real-world problems of practical interest.
Chung, Traiwit (University of New South Wales) | Wang, Ying Da (University of New South Wales) | Armstrong, Ryan T. (University of New South Wales) | Mostaghimi, Peyman (University of New South Wales)
Direct simulation of flow on microcomputed-tomography (micro-CT) images of rocks is widely used for the calculation of permeability. However, direct numerical methods are computationally demanding. A rapid and robust method is proposed to solve the elliptic flow equation. Segmented micro-CT images are used for the calculation of local conductivity in each voxel. The elliptic flow equation is then solved on the images using the finite-volume method. The numerical method is optimized in terms of memory usage using sparse matrix modules to eliminate memory overhead associated with both the inherent sparsity of the finite-volume two-point flux-approximation (TPFA) method, and the presence of zero conductivity for impermeable grain cells. The estimated permeabilities for a number of sandstone and carbonate micro-CT images are compared against estimation of other solvers, and results show a difference of approximately 11%. However, the computational time is 80% lower. Local conductivity can furthermore be assigned directly into matrix voxels without a loss in generality, hence allowing the pore-scale finite-volume solver (PFVS) to be able to solve for flow in a permeable matrix as well as open pore space. This has been developed to include the effect of microporosity in flow simulation.
Mohajeri, Sina (Engineering Support & Technology Development Company) | Eslahi, Reza (Engineering Support & Technology Development Company) | Bakhtiari, Maryam (Engineering Support & Technology Development Company) | Zeinali, Mostafa (Engineering Support & Technology Development Company) | Rajabi, Hamed (Engineering Support & Technology Development Company) | Alizadeh, Ali (Engineering Support & Technology Development Company) | Sharifi, Ebrahim (Engineering Support & Technology Development Company) | Mortezazadeh, Emad (Engineering Support & Technology Development Company)
The popularity of parallel reservoir simulation has increased in recent years due to availability of multi-core CPUs and the opportunity for run-time reduction, especially in complex and fractured reservoir models that include large amounts of data. However, the accuracy of classical domain decomposition techniques significantly decreases as the number of processing units increases. The scope of this work is mainly concentrated on development of multi-threaded algorithms to overcome the mentioned problem in parallel reservoir simulations. In this work, at first, a newly developed, parallel, three dimensional, fully implicit, three-phase black-oil reservoir simulator is introduced. For solving large scale linear systems produced in giant black-oil models, which are probable cases in real world reservoir models, BiCG-Stabilized linear solver, enhanced with various preconditioners including, Blocked Incomplete LU factorization, Constrained Pressure Residual using Algebraic Multi-Grid as its pressure smoother is constructed with a variety of parallelization techniques applied in each part. The presented multi-threaded algorithms can utilize any user-defined number of CPU cores. The algorithm has been evaluated by SPE10 with 1122000 grid blocks and a real data file containing 277875 grid blocks. The result indicates that increasing the number of CPU threads does not have any negative effect on the simulation performance in case of high memory bandwidth limits. For SPE10 data file, 1.70, 2.4, and 2.3 speedup ratios (in comparison with a single thread) has been observed for 2, 4, and 6 parallel threads, respectively. In the real case, the simulation speed is increased by a factor of 1.49, 1.74, and 1.75 for 2, 4, and 6 parallel threads, respectively. Using multi-threaded algorithms as a novel approach in parallel simulations, the solution configuration does not change from the single core case and follows the same exact path. This however is not the case for other simulators that use domain-decomposition techniques. These techniques insert numerical errors in the solution of linear systems, which lead to convergence problems in the whole system.