What is Pentaho?
Pentaho is an ETL tool for data engineering. Pentaho has both community and enterprise edition it depends on your company and project requirement how you want to implement it in your system. It's a very easy and user-friendly tool for the ETL process. You can create transformations and Job to handle your ETL tasks. In new versions of Pentaho now we have Bigdata plugins as well so if want to work in Hadoop we can still create our jobs in Pentaho.
Here I am describing the community edition of Pentaho!
Where to Download?
https://community.hitachivantara.com/s/article/downloads
How to install?
You have to just unzip the package in your machine in your preferred location where you want to put.
We can use Pentaho in client and server both ends. In the client machine we have to install data integration and after that in one click only it's easy to use.
After that, you can just create a new job or a new transformation.
Here you have all options and you have to choose and just drag and drop inside the transformation.
For example, you have an excel file and you want to put that data into your database table you can get both input and output from the design tab and create the transformation.
In both you will get options like details of the input and output you have to fill that information and your transformation is ready to go.
After that save your transformation in your local and run that you will get the logs in your screen.
If it fails you can check the logs why it's failing and if you successfully ran the job you can check your database table.
Same we can do with the Job.
So we have many options here for ETL we can design as per our requirement.
If any driver is not available inside the Pentaho we can install the JDBC drivers and we can put inside the Pentaho directory.
Let's take an example if you are trying to connect with the PostgreSQL database and you are getting an error message that driver is missing in that case you have to download the Postgresql JDBC driver and put that driver inside the lib directory.
Restart your Pentaho and its ready to go now you can connect with your database same you have to follow for other databases.
If you will create a full pipeline for your ETL process your full Pentaho job will look like below image.
many jobs and transformations are included in the above example.
You can run your job in local or in the Pentaho carte server.
For Client-Server architecture of the Pentaho, you can read my another blog.
No comments:
Post a Comment