[TOS tutorial 03] Sorting a File
In this tutorial, use a processing component and learn how to sort data from a file.
This tutorial uses Talend Open Studio Data Integration version 6.
1. Create a new Job
- Ensure that the Integration perspective is selected.
- Create a new Job and name it SortCSVFile.
The Job Designer opens an empty Job.
2. Add and configure a tFileInputDelimited component
- Add a tFileInputDelimited component to the Job.
- To configure the a tFileInputDelimited_1 component, in the Component view of the component, click [...] next to the FileName field, select the file from the local disk, and click Open.
- To describe the structure of the file, open the Schema wizard of tFileInputDelimited_1 and click [...] next to the Edit schema field.
- Click the [+] icon to add the first column and enter the details for the column.
- Repeat step d for each column in the CSV file and close the Schema wizard.
3. Sort the data in your Job
- Add a tSortRow component to the Job and link the two components. Note: The schema of the tFileInputDelimited_1 component is inherited by the linked tSortRow component, so you do not need to configure it.
- To view the schema that has been inherited, in the Component view of the tSortRow component, click […] next to Edit schema.
- To create a new sorting rule based on the movie release year, click [+] and in the Schema column, click releaseYear and specify the sort order by clicking desc.
- To view the result of the sort rule, in the Job Designer, add a tLogRow component and link the tSortRow_1 and the tLogRow_1 components.
- To run the Job, in the Run view for the Job sortCSVFile, click Run.
The movies in the source file will now be sorted based on the year of release.
4. Add a second sort rule
- To add a second sorting rule, in the Component view of the tSortRow_1 component, click (+) and, in the Schema column, choose title. Then in the sort column, choose alpha.
- To run the Job, in the Run view, click Run.
Now, the movies will be sorted by year of release, and within each year, the movies will be sorted in the alphabetical order of the movie title.
5. Store the result of the Job in a file
- Add a tfileOutputExcel component to the Job Designer and link the tLogRow_1 to it.
- To configure the output component, in the Component view of the component, specify the path and name for the output file.
- To include the header row in the output file, select the Include Header.
- To run the Job, in the Run view, click Run.
- To check the moviesSorted.xls file, navigate to the folder in which the file was created and open the file. The file with the sorted data will be displayed.
- To prevent the sorted data from being displayed in the Run view, right-click tLogRow_1 and click Deactivate tLogRow.
- To run the Job, in the Run view, click Run.
The Job is run again. However, no data is displayed in the Run view.
← PREVIOUS TUTORIAL | NEXT TUTORIAL →
Sei pronto a iniziare con Talend?
Altri articoli correlati
- Come iniziare a lavorare con Talend Open Studio for Data Integration
- [TOS tutorial 02] Reading a File
- [TOS tutorial 01] Presentazione di Talend Studio
- [TOS tutorial 07] Configuring Joins in tMap
- [TOS tutorial 08] Aggiunta di filtri basati su condizioni utilizzando il componente tMap
- [TOS tutorial 09] Using Context Variables
- [TOS tutorial 06] Come unire due sorgenti di dati con il componente tMap
- [TOS tutorial 05] Procedura in tre passaggi per filtrare i dati utilizzando il componente tMap
- [TOS tutorial 04] Creazione e uso di metadati
- [TOS tutorial 13] Running a Job on Spark
- [TOS tutorial 12] Scrittura e lettura di dati su file HDFS
- [TOS tutorial 11] Creating Cluster Connection Metadata from Configuration Files
- [TOS tutorial 10] Creating Cluster Connection Metadata
- [TOS tutorial 14] Running a Job on YARN