Skip to content

Slow dataloading with big datasets issue persists #2252

Closed
@hwijeen

Description

@hwijeen

Hi,

I reported too slow data fetching when data is large(#2210) a couple of weeks ago, and @lhoestq referred me to the fix (#2122).
However, the problem seems to persist. Here is the profiled results:

  1. Running with 60GB
Action                             	|  Mean duration (s)	|Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------
Total                              	|  -              	|_              	|  517.96         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------
model_backward                     	|  0.26144        	|100            	|  26.144         	|  5.0475         	|
model_forward                      	|  0.11123        	|100            	|  11.123         	|  2.1474         	|
get_train_batch                    	|  0.097121       	|100            	|  9.7121         	|  1.8751         	|
  1. Running with 600GB, datasets==1.6.0
Action                             	|  Mean duration (s)	|Num calls      	|  Total time (s) 	|  Percentage %   	|
------------------------------------------------------------------------------------------------------------------------------------
Total                              	|  -              	|_              	|  4563.2         	|  100 %          	|
------------------------------------------------------------------------------------------------------------------------------------
get_train_batch                    	|  5.1279         	|100            	|  512.79         	|  11.237         	|
model_backward                     	|  4.8394         	|100            	|  483.94         	|  10.605         	|
model_forward                      	|  0.12162        	|100            	|  12.162         	|  0.26652        	|

I see that get_train_batch lags when data is large. Could this be related to different issues?
I would be happy to provide necessary information to investigate.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions