Data Science AI-CV-NLP, Summer Intern


Company: Ancestry

What you will do:

Ancestry is looking for an exceptional, passionate, and highly motivated Data Science Intern to join our Data Science AI team this summer. The Data Science AI-CV-NLP team develops generative AI, CV and NLP models to extract and organize text and image information from billions of historical and genealogical records. AI, CV, NLP, and LLM models are combined to extract and organize information from historical documents to help customers discover and connect with their family history. As a Data Science intern on the AI-CV-NLP team, you will build, train and fine-tune models that promote product development, customer success, and content creation across our Family History business. You will also work closely with engineering teams to train, optimize, and deploy models.

Implement state of the art generative AI, NLP, LLM, CV solutions for NER, relation extraction, summarization, topic analysis, entity resolution, knowledge graphs, embeddings based information retrieval, story generation, AI driven chat, etc. across various genealogical and historical collections such as newspapers, city directories, family history books, birth, marriage and death records, etc.

Analyze model performance, and explore zero-shot/few-shot label generation to augment or supersede iterating with manual labeling resources to curate and refine training sets to improve model performance

Collaborate with ML Ops and Data Science Engineers to deploy datasets, truthsets, models, pipelines, training and inference code to cloud based model registry

Effectively communicate and present deliverables and solutions to teams, stakeholders, and executives

Who You Are:

Candidate for an advanced degree (MS/PhD) in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or data related quantitative field

Specialization in generative AI, language models, computer vision, deep learning, machine learning, with software development expertise

Experience with applied research through understanding and implementing published models and methods for practical application to real-world problems

Strong proficiency in Python and related AI, LLM, CV, and/or NLP tools and libraries, and familiarity with deep learning frameworks like Pytorch, Hugging Face, OpenAI, TensorFlow, spaCy, SciPy stack and Scikit-learn

Nice to Have:

Experience with LLMs, including training/fine-tuning, prompt engineering, RLHF, performance evaluation and cost analysis

Experience with NLP techniques such as named entity recognition, relationship extraction, document classification, document summarization, topic modeling, machine translation, sentiment analysis, dialogue systems

Experience in document image processing i.e., computer vision methods, image classification, object detection, segmentation, layout analysis, redaction, handwriting recognition

Familiarity with NLP technologies such as, NLTK, spaCy, pandas, numpy, along with understanding of pre-trained language models and architectures like BERT (and variants), GPT, T5, XLNet, PL Marker, TP Linker, OneRel, Hugging Face and OpenAI models, etc.

Familiarity with LLMs and GenAI models such as, LLaMA, Falcon, GPT*, BLIP, CLIP, etc.

Internship Program Details:

Students must be enrolled in an accredited U.S. educational institution with a graduation date after August 2024.

Summer 2024 program dates are May 13 – September 6 (Please note we will have three intern onboarding dates to choose from: May 13th, May 28th, and June 10th. Students may offboard every Friday, beginning August 9th. All internships must be wrapped up by September 6th).

FULLY PAID temporary housing and travel to and from the internship are provided.

All summer internships will be in Lehi, Utah. You will work a combined hybrid and office-based schedule that allows you to choose which days you come into the office and which days you work from temporary housing/home (Utah students).

Interns have the opportunity to network and partner with other interns and industry-leading professionals.

You will participate in engaging events, including executive speaker sessions, professional development, and our annual Intern Days to showcase your project and work.

You will be required to work a full-time schedule (40 hours/week), Monday-Friday.

Company-issued laptop and equipment will be provided for the duration of the internship program.

Our interns enjoy mentorship and experience challenging work while receiving a great compensation package, temporary housing, and having fun, captivating experiences—we have it all!

