Job Purpose :
Analyzing, designing, developing and managing the infrastructure and the data that feeds Data Science models.
The Data Engineer is expected to be in charge of the whole lifecycle of the datasets, including updates, backups, synchronization, and policy access.
Job Responsibilities :
- Managing the lifecycle (from data collection to archive) of ML/DL datasets and ensure their usability for the client’s data scientists.
- Design, build and integrate data from various sources.
- Design ETL pipelines with scripted components.
- Optimize data workflows, choosing the most cost-efficient approach.
- Automate the management of recurrent task in the pipeline.
- Perform feasibility studies/analysis with a critical point of view.
- Support and maintain (troubleshoot issues with data and applications).
- Develop technical documentation for applications, including diagrams and manuals.
- Work on many different software challenges always ensuring a combination of simplicity and
maintainability within the code.
- Contribute to architectural designs of large complexity and size, potentially involving several distinct
- Working closely with data scientists and a variety of end-users (across different cultures) to ensure
technical compatibility and user satisfaction.
- Work as a member of a team, encouraging team building, motivation and cultivate an effective team
- Bachelor’s degree in computer engineering.
- Demonstrated experience and knowledge in Big Data and NoSQL databases.
- Demonstrated experience and knowledge in Object-Oriented Programming.
- Demonstrated experience and knowledge in distributed systems.
- Proficient in programming languages: Python.
- Experience designing and implementing data warehouses.
- Experience developing ETL pipelines.
- Experience working with distributed storage systems in the cloud (Azure, GCP or AWS).
- Experience in the use of collaborative developing tools such as Git, Confluence, Jira, etc.
- Good Problem-solving capabilities.
- Strong ability to analyze and synthesize. (Good analytical and logical thinking capability)
- Proactive attitude, resolutive, used to work in a team and manage deadlines.
- Ability to learn quickly.
- Be familiar with agile methodologies development (SCRUM/KANBAN).
- Ability to communicate effectively in English both written and spoken.
- Master’s degree in data engineering or related.
- Experience managing deep learning datasets.
- Experience managing Cassandra.
- Experience working with Spark.
- Experience implementing CICD pipelines for automation.
Conceptualized as far back as 2015, and commencing full-time operations in 2018, Blackstraw LLc. is a software products and services company specializing in Artificial Intelligence (AI) and Machine Learning solutions for various industries. We support businesses around the world, including North America, Europe and Asia, working to simplify AI implementation through our platform that expedites data labelling, AI model-training, and, cloud or on-premise deployments.
With more than 100 years of combined work-experience, the 100+-strong Blackstraw team comprises various experts in the AI value chain. We are a fast-moving team that prides ourselves in rapidly identifying different use-cases and fine-tuning our products to suit specific business needs.
We are focused on providing solutions related to computer vision, natural language processing, Data annotation tool for deep learning models, etc. To stay competitive in business, it is key for organizations to adopt and implement smart AI solutions and service offerings. However, most companies are unable to implement AI rapidly due to the complexity of existing solutions, inadequate data and cost implications.
Our mission is to enable enterprises to adopt AI in an easier, cost-effective and time-efficient manner with a plug-and-play approach to their data.
Blackstraw operations are based out of Mumbai, Pune and Chennai in India.