Work Partitioning means dividing big/large amount of work between workers. Which means if you have work which is going to take 100hrs to complete by one worker you divide it among multiple worker lets between 10 workers who work parallel , then work of 100hrs get complete in 10hrs. So work partitioning helps to complete works faster.
Once can apply work partitioning to process data faster by dividing work among set of machines (like in big data where large amount high volume data processed by set of machines located at on premise or at in cloud) or process incoming request by set of machines with help of Load Balancers. But in this post I am going to put information about partitioning work in Process/Application using Task Parallel library of C# language.
Below is code that make use of C# task parallel library and does the processing of lots of records, here is 1m records.
public List<TransFormedCustomer> GetProcessData() { var processRecordCount = 10000; var retModels = new List<TransFormedCustomer>(); var lotOfRecords = GetFakeData(); //example million records from datasource like database, web service, csv file int tasksCount = (lotOfRecords.Count / processRecordCount); if (lotOfRecords.Count % processRecordCount > 0) tasksCount++; var tasks = new Task<List<TransFormedCustomer>>[tasksCount]; for (int i = 0; i < tasksCount; i++) { IEnumerable<Customer> datas; int skip = (processRecordCount * i); int take = processRecordCount; if (i == 0) datas = lotOfRecords.Take(take); else datas = lotOfRecords.Skip(skip).Take(take); tasks[i] = Task.Factory.StartNew((obj) => GetProccessedData(datas), Tuple.Create(skip, take)); #if DEBUG if (i == tasksCount - 1 && lotOfRecords.Count % 10000 != 0) take = lotOfRecords.Count % 10000; Console.WriteLine($"task no {i} started, reading from {skip} to {skip + take}"); #endif } try { Task.WaitAll(tasks); } catch { //left empty as logging errors in below foreach block } foreach (var task in tasks) { if (task.IsCanceled || task.IsFaulted) { var state = (Tuple<int, int>)task.AsyncState; Console.WriteLine($"taks failed to process data from {state.Item1} to {state.Item2}"); } else retModels.AddRange(task.Result); } return retModels; }Code works above works as follows
- It sets number number of records one task can process , here in code its 10000 set in "processRecordCount" variable.
- Code get 1m records from the datasoruce, here "GetFake()" method(code listed at the end) returns 1m customers records
- "taskCount" variable stores number of task need to process received records ("GetProccessedData(datas)"- method listed below) i.e. here its 100000/1000 = 10 number of task required to process records.
- In for loop with help of "Take" or "Skip" , each task get 10000 to process.
- "Task.WaitAll()" wait for all task to finish work
- After work get finished by each task, processed records get added to "retModels" collection.
- If task to fail to process record then it logs information on console.
Output
- Very huge amount of data get processed faster as it gets process parallelly with help of tasks i.e. workers\
- It make use of multicore CPU i.e. computing efficiently as one task get processed by one core mostly
GetFakeData- method
Fake data produced by using Faker.Net "install-package Faker.Net"
private List<Customer> GetFakeData() { var customers = new List<Customer>(); for (int i = 0; i < 100000; i++) { customers.Add( new Customer() { FirstName = Faker.Name.First(), LastName = Faker.Name.Last(), EmailAddress = Faker.Internet.Email(), SSN = Faker.Identification.SocialSecurityNumber() } ); } return customers; }
private List<TransFormedCustomer> GetProccessedData(IEnumerable<Customer> customers) { var transFormedCustomers = new List<TransFormedCustomer>(); foreach (var customer in customers) { //do processing on recevied data //enrich data and add it to process collection string formattedSSN = customer.SSN.Insert(5, "-").Insert(3, "-"); string formattedName = customer.FirstName + ' ' + customer.LastName; string email = customer.EmailAddress; Regex regex = new Regex(@"^([\w\.\-]+)@([\w\-]+)((\.(\w){2,3})+)$"); Match match = regex.Match(email); if (!match.Success) email = "dummy@dummy.com"; var transFormedCustomer = new TransFormedCustomer() { Name = formattedName, SSN = formattedSSN, EmailAddress = email }; transFormedCustomers.Add(transFormedCustomer); } return transFormedCustomers; }Customer & TransformCustomer Modal class
public class Customer { public string EmailAddress { get; set; } public string FirstName { get; set; } public string LastName { get; set; } public string SSN { get; set; } } public class TransFormedCustomer { public string EmailAddress { get; set; } public string Name { get; set; } public string SSN { get; set; } }