Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators

Our work on Orca and Orca 2 demonstrated the power of using synthetic data for the post-training of small language models and getting them to levels of performance previously found only in much larger language models. Orca-AgentInstruct is another step in this direction, where we explore using agentic flows to generate diverse and high-quality data at scale. Orca-AgentInstruct is an agentic solution for synthetic-data generation. By leveraging an agentic framework, AgentInstruct can generate tailored datasets, comprising both prompts and responses, […]

Read more

Abstracts: November 14, 2024

TONG WANG: Thank you, Bonnie. KRUFT: Microsoft Research is one of the earliest institutions to apply AI in biomolecular simulation research. Why did the AI for Science team choose this direction, and—with this work specifically, AI2BMD—what problem are you and your coauthors addressing, and why should people know about it? WANG: So as Richard Feynman famously said, “Everything that living things do can be understood in terms of the jigglings and the wigglings of atoms.” To study the mechanisms behind […]

Read more

Using portable SIMD in stable Rust

In a previous post we saw that you can speed up code significantly on a single core using SIMD: Single Instruction Multiple Data. These specialized CPU instructions allow you to, for example, add 4 values at once with a single instruction, instead of the usual one value at a time. The performance improvement you get compounds with multi-core parallelism: you can benefit from both SIMD and threading at the same time. Unfortunately, SIMD instructions are specific both to CPU architecture […]

Read more

Python Dictionary Comprehensions: How and When to Use Them

Dictionary comprehensions are a concise and quick way to create, transform, and filter dictionaries in Python. They can significantly enhance your code’s conciseness and readability compared to using regular for loops to process your dictionaries. Understanding dictionary comprehensions is crucial for you as a Python developer because they’re a Pythonic tool for dictionary manipulation and can be a valuable addition to your programming toolkit. In this tutorial, you’ll learn how to: Create dictionaries using dictionary comprehensions Transform existing dictionaries with […]

Read more

Toward modular models: Collaborative AI development enables model accountability and continuous learning

Today, development of generalizable AI models requires access to sufficient data and compute resources, which may create challenges for some researchers. Democratizing access to technology across the research community can advance the development of generalizable AI models. By applying the core software development concept of modularity to AI, we can build models that are powerful, efficient, adaptable, and transparent.  Until recently, AI models were primarily built using monolithic architecture. Though powerful, these models can be challenging to customize and edit […]

Read more

Research Focus: Week of November 11, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. NEW RESEARCH Look Ma, no markers: holistic performance capture without the hassle Motion-capture technologies used in film and game production typically focus solely on face, body, or hand capture, requiring complex and expensive hardware and lots of manual intervention from skilled operators. While machine-learning-based  

Read more

Formatting Floats Inside Python F-Strings

You’ll often need to format and round a Python float to display the results of your calculations neatly within strings. In earlier versions of Python, this was a messy thing to do because you needed to round your numbers first and then use either string concatenation or the old string formatting technique to do this for you. Since Python 3.6, the literal string interpolation, more commonly known as a formatted string literal or f-string, allows you to customize the content […]

Read more

Preventing side-channels in the cloud

Cloud computing delivers scalable and cost-effective compute resources to a wide range of customers. The ability for cloud providers to share components of the hardware stack across customers, or tenants, is essential for running efficient cloud systems. For example, modern central processing units (CPUs) pack hundreds of physical hardware threads sharing terabytes of dynamic random-access memory (DRAM), which can be flexibly assigned to many independent virtual machines (VMs). Preventing tenants from snooping on others who  

Read more

Python News Roundup: November 2024

The latest Python developments all point to the same thing—Python is currently thriving. The recent GitHub Octoverse 2024 report has revealed that Python is now the most used language on GitHub. Also, last month saw the release of Python 3.13, which is already laying the groundwork for some exciting future improvements. While Python core developers have been busy exploring the language’s features as they tinker with upcoming enhancements, it’s good to know that working on Python’s source code isn’t the […]

Read more
1 2 3 4 907