Dexterous multi-fingered robotic hands represent a promising solution for robotic manipulators to perform a wide range of complex tasks, through acquiring more general purpose skills. Nevertheless, developing complex behaviours for a robot needs sophisticated control strategies, which requires domain expertise with a good understanding of mathematics, and underlying physics, this will be very difficult for such complex robots. Learning algorithms like deep reinforcement learning provide a general framework for robots to learn complex behaviours directly from data. But, even for learning algorithms, dexterous manipulators represent a major challenge due to their high dimensionality and contact rich under-actuated object manipulation. To overcome these challenges, learning algorithms demand extensive data generation, which is often expensive or hard to obtain. So, the need for more efficient and more effective algorithms -that extract more useful information from the least amount of available data- has increased. This paper presents a contribution which is the development of the algorithm Model Predictive Control-Soft Actor Critic (MPC-SAC), an algorithm that combines offline learning with online planning to develop a control policy. The algorithm is benchmarked on two challenging complex dexterous manipulation tasks, against other state of the art model free reinforcement learning algorithms. Finally, it was found that the algorithm achieves asymptotic performance with state of the art data efficiency in both tasks.